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Abstract 


Air  Force  doctrine  requires  reliable  and  accurate  information  when  striking 
targets.  Further,  this  doctrine  states  that  fusion  should  be  utilized  whenever  possible  to 
ensure  the  best  possible  infonnation  is  conveyed;  there  is  no  specific  guidance  as  to  how 
to  fuse  this  information.  This  thesis  extends  the  research  found  in  Leap,  Bauer,  and 
Oxley  (2004)  to  include  a  non-declared  class.  The  Identification  system  operating 
characteristic  (ISOC)  was  adapted  to  allow  for  non-declarations  both  at  the  individual 
sensor  level  as  well  as  the  fused  output  level.  A  probabilistic  neural  network  (PNN)  was 
also  used  as  a  fusion  technique.  A  cost  function  was  developed  that  incorporated 
misclassification  error  as  well  as  non-declaration  rules.  In  addition,  a  heuristic  was 
developed  to  find  optimal  rules  through  a  likelihood  ratio  method.  Finally,  a  sensitivity 
analysis  was  perfonned. 
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AN  INVESTIGATION  OF  THE  EFFECTS  OF  CORRELATION  AND 


AUTOCORRELATION  ON  CLASSIFIER  FUSION  WITH  NON-DECLARATIONS 

1.  Introduction 

1.1  Background 

Correct  discrimination  of  hostile  and  friend  forces  is  paramount  to  success  in  air 
operations.  The  possibility  of  incorrectly  classifying  hostiles  as  friends  and  vice  versa 
can  cost  the  lives  of  U.S.  servicemen.  Automatic  target  recognition  (ATR)  is  one  method 
of  discriminating  targets.  ATR  consists  of  the  following  six  steps:  detection,  location, 
combat  identification  (CID),  decision,  execution,  and  assessment  (AFP AM  14-210, 

1998).  Of  the  six  steps  necessary  for  ATR,  CID  is  considered  one  of  the  most  critical  and 
challenging  problems  facing  the  defense  community  today  (Robinson  and  Aboutalib, 
1989).  Combat  identification  accounts  for  a  major  point  of  emphasis  in  the  military’s  kill 
chain  consisting  of  search,  detect,  track,  classify,  etc.  (Haspert,  2000).  The  use  of  multi¬ 
sensor  fusion  offers  an  avenue  for  improvements  in  classification  accuracy.  Multi-sensor 
fusion  combines  information  from  multiple  sources  to  create  inferences  that  cannot  be 
achieved  through  single  source  intelligence  or  infonnation  (Hall  and  Steinberg,  2001). 
This  improvement  is  supported  by  Air  Force  guidance  and  other  research  efforts.  In 
particular,  Air  Force  targeting  guidance  states  that  the  use  of  fusion  should  be  used 
whenever  possible  to  enhance  intelligence  support  while  adding  credibility  and  accuracy 
(AFP AM  14-210,  1998).  One  common  assumption  in  fusion  models  is  independence 
both  between  classifiers  and  across  features.  Although  this  occurs  in  some  cases,  there  is 
a  limited  amount  of  research  as  to  what  happens  when  information  collected  from 
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classifiers  fails  this  assumption  (Willett,  et  al.,.  2000).  Past  research  and  graduate  theses 
investigated  the  effects  of  correlation  on  different  fusion  methods.  This  research  furthers 
the  previous  efforts  of  Storm  et  al., .  (2003)  and  Leap  et  al., .  (2004)  to  include  an 
“indifference  zone.”  The  indifference  region  will  consider  removal  of  exemplars  within  a 
window  corresponding  to  a  high  probability  of  misclassification.  Two  fusion  models  will 
be  utilized  to  test  performance  with  the  addition  of  the  indifference  region.  These  models 
are  extensions  to  the  Identification  System  Operating  Characteristic  (ISOC)  fusion 
(Haspert,  2002)  and  Probabilistic  Neural  Network  (PNN)  fusion.  The  ISOC  assumes 
independence  of  the  classifiers  while  the  PNN  does  not  operate  on  that  assumption. 

1.2  Problem  Statement 

In  this  research  the  effects  of  adding  a  third  class,  that  is  non-declared,  are 
explored.  Data  created  in  Matlab  were  used  to  structure  a  relatively  easily  separable 
problem  for  a  pilot  study.  Data  created  in  Leap  2004  will  be  used  to  test  the  addition  of 
non-declarations.  Several  problems  will  be  visited,  each  with  increasing  complexity. 

The  desired  output  will  be  a  useful  rule  applied  to  the  specific  problem  geometries 
considered  with  the  hopes  of  insight  for  use  of  non-declarations  in  the  field.  Although 
training,  test  and  validation  data  were  generated  for  this  research,  the  methodology 
should  be  applicable  to  real  world  scenarios. 

1.3  Outline  of  Thesis 

This  research  consists  of  the  following  five  chapters:  Introduction,  Literature 
Review,  Methodology,  Findings  and  Analysis,  and  Conclusions.  A  succinct  chapter  by 
chapter  explanation  follows. 
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Chapter  1:  Introduction  -  Background  for  this  research  as  well  as  the  problem 
statement  and  research  objectives  are  developed. 

Chapter  2:  Literature  Review  -  Reasons  to  fuse  data,  methods  for  fusion  of  data 
and  their  assumptions,  and  pertinent  literatures  are  reviewed  as  well  as  data  generated  in 
Leap  (2004)  used  for  testing  and  validation. 

Chapter  3:  Methodology  -  Fusion  methods  employed  are  addressed  including  two 
heuristics  for  adding  non-declarations  based  on  the  cost  of  misclassification. 

Chapter  4:  Findings  and  Analysis  -  Bulk  of  research  showing  what  happened 
when  problems  described  in  Chapter  3  were  actually  tested. 

Chapter  5:  Conclusion  and  Recommendations  -  A  brief  review  of  the  research 
results  as  well  as  ideas  for  follow  on  research  are  presented. 
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2.  Literature  Review 


2.1  Introduction 

This  chapter  focuses  on  relevant  literature  applicable  to  this  research.  This  review 
initially  discusses  the  need  for  classifier  fusion  as  found  in  Air  Force  Doctrine.  Next,  it 
explores  the  fusion  methods  utilized.  Then,  it  details  the  two  types  of  multi-classifier 
fusion  methods  utilized  in  this  study.  Finally,  the  data  sets  that  were  employed  are 
described  in  detail. 

2.2  Air  Force  Direction 

“Since  the  Wright  Brothers  first  flew  at  Kitty  Hawk,  the  airplane  has  continually 
evolved  as  an  instrument  of  military  and  national  power.  Today,  the  proper  employment 
of  aerospace  power  is  essential  for  success  on  and  over  the  modem  battlefield”  (AFDD  2- 
1,  2000).  In  order  to  achieve  success,  targets  must  be  correctly  identified  leading  to 
precision  engagement  of  the  enemy.  Intelligence  for  targets  should  be  based  upon 
multiple  sources  for  improved  accuracy  and  reliability  (AFP AM  14-210).  Air  force 
doctrine  clearly  states  that  great  care  should  be  taken  to  minimize  civilian  casualties 
while  military  objectives  are  correctly  identified  and  attacked  (AFP AM  14-210,  1998). 
Minimizing  civilian  casualties  requires  sound  target  intelligence  which  enhances  military 
effectiveness  by  showing  that  the  risks  undertaken  are  militarily  worthwhile  (AFP AM 
14-210,  1998).  Intelligence,  surveillance,  and  reconnaissance  (ISR)  are  critical  aerospace 
mission  areas  related  to  CID.  ISR  relies  on  fused  infonnation  for  accurate  intelligence 
suitable  to  deny  adversary  efforts  at  impeding  information  collection.  Fused  information 
shows  the  big  picture  allowing  commanders  a  more  lucid  depiction  of  the  battlespace 
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(AFDD  2-5.2,  1999).  Combining  multi-source  data  into  necessary  intelligence  useful  in 
decision  making  is  called  fusion  (AFP AM  14-210,  1998). 

2.3  Fusion  Methods 

Identification  System  Operating  Characteristic  ISOC  fusion  and  Probabilistic 
Neural  Networks  PNN  are  the  two  techniques  considered  in  this  effort.  The  main 
difference  in  the  above  methods  can  be  viewed  from  a  top  level  as  the  difference  between 
feature  level  and  decision  level  fusion.  Neural  networks  operate  in  a  manner  similar  to 
feature  level  fusion  (see  Figure  2-1  below).  Features  are  extracted  and  fused  in  the  chosen 
network  before  an  exemplar  is  classified  into  a  group  based  upon  a  probability  of  class 
membership.  Decision  level  fusion  first  labels  exemplars  at  the  individual  classifier 
level.  These  labels  are  then  fused  to  create  a  single  fused  indication  for  a  target.  The 
ISOC  fusion  method  is  one  example  of  decision  level  fusion.  Once  exemplars  are 
classified  into  output  labels  (hostile,  friend,  etc.),  they  are  compared  using  logical  rules. 
One  simple  rule  for  combining  these  output  labels,  assuming  only  two  classifiers,  would 
be  to  declare  a  target  as  hostile  only  if  both  classifiers  labeled  the  target  as  hostile 
(Robinson  and  Aboutalib,  1989).  Figure  2-1  addresses  the  difference  between  feature  and 
decision  level  fusion  in  more  detail. 
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Feature  Level  Fusion  Decision  Level  Fusion 


2.4  Multi-Classifier  Fusion  Levels 

One  major  assumption  of  decision  level  fusion  is  independence  of  classifiers, 
specifically  independence  of  feature  vectors  classified  by  classifiers  (Robinson  and 
Aboutalib,  1989).  Little  is  known  when  the  assumption  of  independence  does  not  hold 
for  a  given  feature  set  (Willett,  2000).  Robinson  and  Aboutalib  considered  the 
mathematical  implications  of  dependence  on  decision  level  fusion  techniques.  Assume  a 
population  set  contains  two  distinct  classes,  Ci  and  C2.  Also  assume  the  a  priori 
probabilities  of  class  membership  are  known,  P{Cj)  and  P( C2)  (Robinson  and  Aboutalib, 
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1989).  The  cost  of  decisions  are  as  follows:  F(Ci,Ci)  =  axl  is  the  cost  of  a  true  positive, 
F(C2,C2)  =  a22  is  the  cost  of  a  true  negative,  F(Ci,C2)  =  ax2  is  the  cost  of  a  false  positive, 
and  F(C2,Ci)  =  a2l  is  considered  the  cost  of  a  false  negative.  Assume  that  F(Q,Cj)  > 
F(Q,Q)  for  i  *  j  and  i  =  1,2  (Robinson  and  Aboutalib,  1989).  With  two  features,  a 


composite  feature  vector  can  be  defined  as  Xn 


A, 


.  Through  Baye’s  rule  and  the 


expected  value  of  the  cost  function,  it  can  be  shown  that  the  likelihood  of  being  in  Ci  is 


P(Xm  |  Cj)  (aX2  -a22)  p(C2 ) 


> 


P(Xm\C2)  (a2x-au)  p(Cx) 


(Robinson  and  Aboutalib,  1989). 


A  decision  rule  from  the  above  likelihood  is  thus. 


if 


P(Xm\Cx)-p(Cx) 

P(Xm\C2)-p{C2) 


>  ( « n — ^22^ ^  then  X  is  assigned  to  Ci. 
(«21  —  ^11  ) 


With  regards  to  feature  level  fusion,  if  an  =  a22  and«12  =  a2X ,  the  decision  rule  above 
yields  the  “minimum  probability  of  error  decision”  (Robinson  and  Aboutalib,  1989). 

The  right  hand  side  of  the  inequality  becomes  a  constant  allowing  a  global  minimum  to 
be  reached. 

Continuing  with  consideration  to  decision  level  fusion,  the  cost  function  to  be 
minimized  now  contains  two  decisions,  Di  and  D2  corresponding  to  the  labels  applied  at 
each  classifier.  The  expected  value  function  now  carries  added  terms  and  grows  more 
complex.  In  fact,  the  likelihood  decision  function  is  now  changed  to  the  following. 


if  P(Ci)-P(Xx  |  Ci)  ^  n2=c,  x 


£  Y\  J P(d2  I  )p(K2  |  Xx ,C2)dX2 


p(C2)-p(Xx\C2) 


ZrJp(D2\x2)P(x2\xx,c2)dx7 

n2=c,  i 


X_x  from  Cj 
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where 

yx  =  \F(DX  =  Cx,D2,A  =  C2)-F(D1  =  C2,D2,A  =  Cx)]  and 

/2  =[F(D1=Ci,D2,A  =  C2)-F(D,=C2,D2,A  =  C2)\ 

(Robinson  and  Aboutalib,  1989). 

By  inspection  of  the  above  inequality,  to  reach  a  global  optimum  one  needs  to  know 
p(D2|2G)  and  pQGIXnffy)  for  ap  ^x  1  \  Xx)  (Robinson  and  Aboutalib,  1989).  Thus,  in 
order  to  reach  a  global  optimum  using  decision  level  fusion  with  two  possible  classes, 
one  should  not  classify  the  classifiers  independently,  but  should  have  prior  knowledge  of 
one  classifier  before  using  the  other.  However,  if  the  feature  vectors  are  truly 
independent  of  one  another,  the  right  hand  side  of  the  inequality  becomes  a  constant  as  in 
feature  level  fusion  and  classifying  each  classifier  independently  can  return  the  fused 
global  optimum  (Robinson  and  Aboutalib,  1989).  The  addition  of  a  third  unknown  class 
was  not  addressed  and  the  effects  of  correlation  were  uncertain. 

Further  research  by  Willett,  Swaszek,  and  Blum  considered  what  level  of 
classifier  processing  of  a  Gaussian  shift  in  mean  problem  was  required  to  reach  optimum 
performance  (Willett  et  al.„  2000).  Feature  vectors  and  correlation  coefficients  were  the 
parameters  considered  between  exemplars.  With  this  problem,  difficulties  arising  from 
levels  of  statistical  dependence  resulted  in  several  complicated  rules.  After  partitioning 
the  space  of  Gaussian  shift  in  mean  problems  into  three  regions  named  “the  good,”  “the 
bad,”  and  “the  ugly,”  research  reflected  optimal  rules  based  upon  the  partitioned  region 
of  use  (Willett  et  al.„  2000).  In  particular,  any  problem  in  “the  good”  region  required 
statistical  independence  of  feature  vectors  in  order  to  reach  optimality  (Willet  et  al.,, 
2000).  Outside  of  “the  good”  region,  complex  rules  and  problem  specifics  define  the  rule 
needed  which  might  be  able  to  operate  without  the  assumption  of  independence  (Willett 
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et  al.„  2000).  Feature  level  fusion  does  not  require  independence  of  their  features;  there 
is  not  much  new  information  received  with  high  levels  of  correlation  and  feature 
selection  might  be  useful. 

2.5  ISOC 

The  Identification  System  Operating  Characteristic  (ISOC)  fusion  method  seeks 
the  lowest  operational  cost  for  a  given  threshold  (Ralston,  1999).  This  particular  rule  is 
then  applied  to  all  future  exemplars.  The  ISOC  differs  from  traditional  classifier  fusion 
methods  which  frequently  utilize  fixed  rules  in  seeking  a  minimum  cost  (Haspert,  2000). 
While  fixed  rules  remove  difficulty  in  terms  of  implementation,  they  often  do  not  reach  a 
global  optimum  solution  (Haspert,  2000).  Bayesian  techniques  have  the  ability  to 
produce  optimal  ID  classifier  fusion  rules  (Haspert,  2000).  Two  common  target  classes 
are  hostile  and  friend.  This  research  extends  to  a  third  target  class,  unknown. 

2.5.1  Classifier  Performance  Matrices 

Classifiers  take  an  exemplar  and  output  a  classification  label  as  shown  in  Table  2-1. 
Decision  level  fusion  methods  such  as  the  ISOC  take  these  matrices  from  all  available 
classifiers  and  make  a  fused  decision. 
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Table  2-1:  Single  Classifier  Confusion  Matrix. 


Indication 

Truth 

“H” 

“P” 

“ND” 

H 

P(“H”|H) 

TP 

P(“F”|H) 

FN 

P(“ND”  H) 

F 

P(“H”|F) 

FP 

P(“F”  F) 
TN 

P(“ND”  F) 

The  above  confusion  matrix  displays  the  possible  outputs  from  a  single  sensor/classifier. 
The  rows  in  the  table  correspond  to  the  true  states  of  nature  for  targets  while  the  columns 
represent  the  declaration  made  by  the  classifier.  For  instance,  P(“ND”  |  H)  represents  the 
probability  of  this  classifier  declaring  the  target  as  unknown  when  it  was  in  fact  hostile. 
This  is  the  confusion  matrix  used  throughout  this  research  with  H  being  hostile,  F  being 
friend,  and  ND  being  unknown.  This  matrix  can  be  expanded  as  needed  to  match 
classifier  outputs  and  classes  possible.  Accurate  classifiers  will  carry  large  values  in 
P(“H”  |  H)  and  P(“F”  |  F)  and  small  probabilities  elsewhere  (Haspert,  2000). 

2.5.2  Combat  Identification  System  States 

Let  Ns  be  the  total  number  of  classifiers  in  a  system  and  i  be  the  classifier  in 
consideration  within  the  system  with  1  <  i  <  Ns  (Ralston,  1998).  Further,  let  n,  be  the 

number  of  classifier  states  of  the  ith  classifier  accounting  for  the  rows  of  the  performance 
matrix  shown  in  Table  2-2.  Finally,  let  k,  be  the  classification  state  of  the  ith  classifier 
withl  <  k;  <  n; .  Assuming  there  is  some  level  of  independence  across  the  classifiers, 

Ns 

then  A  =  ]” |  ni  (Ralston,  1998).  The  following  classifier  performance  matrix  in  Table  2-2 

i=i 

represents  the  probabilities  found  from  data  accumulated  through  exercises,  tests  or 
analyses  (Ralston,  1998).  It  can  easily  be  adapted  to  handle  more  true  classes  as  well  as 
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output  states.  Let  Sj  denote  the  jth  configuration  of  the  combat  ID  system  (CID)  where 


N 

S  =  [J  Sf  represents  all  possible  configurations  of  the  CID  withl  <  j  <  N  (Ralston, 

;= i 

1998). 

Table  2-2:  Classifier  Performance  Matrix. 


j 

Sj 

1 

(sl,s2,s3,...,sNs ) 

2 

(s)  ,s2,s3 ,sNs ) 

3 

(s3  s 3  S3  53  ) 

Wl  ’^3  ) 

N 

(  N  N  N  N  \ 

(^1  ,s2  ,s3 

The  classifier  perfonnance  matrix  can  be  used  to  create  the  probability  matrix  from  Table 
2-1.  If  there  is  negligible  correlation  among  classifiers,  classifier  probability  matrices  can 
be  multiplied  to  create  conditional  probabilities  of  being  in  a  state  of  the  classifier 
performance  matrix  Sj  given  truth  T  (T  e  { H,F }  ).  Thus,  an  equation  for  this  probability 

N , 

isP(Sj  |  T)  =  n^(5/  |  T )  where  s'.  denotes  the  state  of  the  jth  classifier  in  the  ith 

;=i 

configuration  (Leap,  2004;  Ralston,  1998).  It  is  important  to  note  that  ^  P(S ,.  |  T)  =  1 
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such  that  JP(St  j  H)  =  Y,P(S  j  \  F)  =  1  (Ralston,  1998).  The  classifier  performance 

j  j 

matrix  for  a  two  classifier,  three  output  problem  as  used  in  parts  of  this  research  would 
look  like  the  one  in  Table  2-3. 


Table  2-3:  Sample  Classifier  Output  State  Matrix. 


j 

(s{,s}2) 

1 

(H,H) 

2 

(H,U) 

3 

(H,F) 

4 

(U,H) 

5 

(U,U) 

6 

(U,F) 

7 

(F,H) 

8 

(F,U) 

9 

(F,F) 

2.5.3  Identification  Fusion  Rules 

A  fusion  rule  consists  of  a  vector  relaying  classifier  output  combinations  to 
declare  in  a  class.  This  rule  is  made  to  resolve  conflicting  indications  from  independent 
classifiers  (Ralston,  1998).  A  complete  identification  (I.D.)  fusion  rule  can  be  expressed 
as  a  vector  R.  This  N  dimensional  vector  R  =  (ri,  T2,  r3,  . .  .,rN)  corresponds  to  the 
different  state  outputs  as  shown  in  Table  2-3  above.  For  j  =  1 , 2,. . .,  N ,  each  rj  e  {0,  1 } 
denotes  a  declaration  for  that  rule  (Leap,  2004).  For  instance,  in  the  case  considered 
here,  a  hostile  rule  could  be  ( 1 , 1 , 0, 0) .  This  rule  would  declare  a  target  as  hostile  “H” 
any  time  both  classifiers  labeled  the  target  as  hostile  ("H","H")  or  only  the  first  classifier 
declares  hostile ("H","F")  (Ralston,  1998).  The  converse  will  be  classified  as  friends, 
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("F","H")or("F","F") .  The  probability  of  declaring  a  target  as  hostile  given  that  it  is 

N 

hostile  is  P("H"\  H )  =  ^P(S ;  \  H )  •  rj  and  the  probability  of  misclassifying  the  target  as 

j=i 

N 

a  hostile  is  P("H"\ F )  =  ^ \P(S .  | F)-rj  (Ralston,  1998).  Figure  2-2  further  depicts  the 

7=1 

ISOC  fusion  process. 
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ISOC  Classifier  Fusion  Process 
(Complete  Enumeration) 


Figure  2-2:  ISOC  Classifier  Fusion  Process  (Forced  Decision) 
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Complete  enumeration  of  all  possible  rules  can  become  daunting  with  the  addition  of 
classifiers  and/or  states.  For  instance,  a  two  classifier-three  output  system  would  contain 
512  possible  rules  (29),  but  a  three  classifier-three  output  system  would  contain 
134,217,728  possible  rules  or  (2“  ).  While  2"  can  require  days  of  run  time  depending  on 
machine  speed,  29  rules  are  manageable  and  thus  complete  enumeration  was  employed 
for  the  ISOC  fusion  forced  decision  method  described  in  chapter  three  to  find  the  optimal 
rule  set  based  on  cost. 

2.5.4  Likelihood  Ratio  Approach  to  Rules  Selection 

In  order  to  achieve  the  best  classification  in  the  two  classifier  case,  one  must  look 
to  find  the  rule  that  maximizes  P("H"|  H)  while  minimizing  P("H"|  F) ;  a  solution  where 
the  greatest  number  of  true  targets  are  identified  with  the  least  number  of  false  alarms 
possible  for  the  two  class  problem  would  then  be  reached  (Ralston,  1998).  The  best 
possible  classifier  performance  would  be  to  correctly  identify  all  hostiles  and  friends. 

This  is  usually  not  feasible,  but  the  likelihood  ratio  described  below  allows  a  chance  to 
see  how  close  a  set  of  classifiers  are  to  that  perfect  classification.  The  likelihood  ratio 
method  considers  the  optimal  hostile  and  friend  rules  built  in  sequence  element  by 
element.  There  are  two  rules  that  can  easily  be  compared  to  the  optimal  classifier 
performance,  never  declare  any  targets  hostile  and  always  declare  all  targets  hostile 
(Haspert,  1998).  The  “always  declare  hostile”  rule  ensures  that  no  targets  are  missed  at 
the  cost  of  friendly  casualties;  this  rule  is  accomplished  by  declaring  all  exemplars  hostile 
by  setting  all  elements  of  the  hostile  rule  r.  =1  for  all  j  =  1,...,N  (Ralston,  1998).  This 

ensures  all  hostile  forces  are  engaged  at  the  cost  of  the  highest  level  of  fratricide  possible. 
The  most  conservative  “never  declare  hostile”  rule  is  found  by  setting  all  elements  of  the 
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hostile  rule  r  =0  for  all  j  =  1 , . . . ,  N .  The  next  most  conservative  rule  will  contain  1 
r  =  1  and  all  other  r  =  0  thus  only  declaring  one  classifier  output  state  as  hostile.  This 

p(S,  |  H) 

rule  is  the  state  with  the  largest  likelihood  ratio  found  by  calculating - - -  for  all  j 

p(Sj\F) 

and  sorting  these  ratios  from  greatest  to  smallest.  States  are  added  in  order  of  their 
likelihood  ratios  to  the  hosile  rule  until  the  “always  declare  hostile”  rule  is  reached.  This 
process  of  adding  rules  based  on  their  likelihood  of  being  hostile  forms  the  optimal  ISOC 
boundary.  The  following  algorithm  further  explains  this  ISOC  boundary  creation  (Storm, 
Bauer  and  Oxley,  2003;  Leap,  2004). 

1.  Calculate  P(Sj  T)  for  all  j  =  1,...,9  and  T  e{H,F}  from  the  classifier  confusion 
matrices. 

2.  Calculate  LRJ  =  P(S  j  \  H)/ P(S j  \  F )  for  all  j  where  LRJ  the  likelihood  ratio  for  state 
j  of  the  classifier  output  matrix  is. 

3.  Rank  LRj  from  greatest  to  smallest  such  that  LR^  >  LR ^  >  . . .  >  LR^  where  LR^  is 
the  largest  and  LR^  is  the  smallest  ratio. 

4.  Select  5,  corresponding  to  the  largest  LR^  not  yet  included  in  the  hostile  fusion  rule 
(  r.  =1  in  R) 

x  Jn  7 

5.  Go  to  3  unless  r;  =  1  for  all  j 

This  algorithm  “turns  on”  elements  of  the  hostile  rule  in  decreasing  order  of  their 
likelihood  ratio  (Ralston,  1998).  The  two  extreme  rules  described  above  generate  the  end 
points  of  the  boundary  and  the  above  algorithm  forms  the  rest  of  it  in  succession  of 
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likelihoods.  Thus,  there  will  be  N  + 1  distinct  rules  forming  the  ISOC  boundary 
(Ralston,  1998).  The  ISOC  fusion  process  using  likelihood  ratios  is  depicted  in  Figure  2-3. 


ISOC  Classifier  Fusion  Process 


Figure  2-3:  ISOC  Classifier  Fusion  by  Likelihood  Ratios. 
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This  forms  the  basis  for  using  likelihood  ratios  to  generate  optimal  rules.  This  section 
will  be  readdressed  in  chapter  three  with  further  implementation  concerns. 

2.5.5  Total  Cost  of  Misclassification 

In  order  to  have  a  true  comparison  between  rules,  a  metric  must  be  defined.  The  metric 
used  herein  calculates  costs  of  misclassifications  based  on  probabilities  found  in  the 
classifier  probability  matrix  in  Table  2-1  (total  cost  function  below  does  not  include  non¬ 
declarations;  “ND”  in  table  will  be  addressed  in  this  context  in  chapter  three).  For  the 
two  classifier-two  output  case,  a  total  cost  function  is 

CT  =  CFN  x  P(H)  x  P(FN)  +  CFP  x  P(F)  x  P(FP)  (2- 1 ) 

Where 

CT  =  Total  cost  for  the  rule  being  tested 
CFN  =  Cost  of  a  false  negative 

P(H)  =  a  priori  probability  of  a  target  being  hostile 
v  _ 

P(FN)  =  yp(s,  |  H )  x  r  j  =  Probability  of  a  false  negative 

j= i 

r.  =  Element  j  of  rule  R  (Defined  in  2.5.6) 
r  ,  =  1  -  rj  (Complement  of  rule  R) 

CFP  =  Cost  of  a  false  positive 
P(F)  =  a  priori  probability  of  a  target  being  a  friend 

N 

P(FP)  =  yp(si  |  F)  x  r.  =  Probability  of  a  false  positive 

j= i 

In  order  to  compare  rules  allowing  non-declarations,  tenns  need  to  be  added  and  will  be 
addressed  in  the  methodology  section. 

2.5.6  Notation 

Thus  far,  rules  were  limited  to  hostile  or  friend  with  the  following  hostile  notation: 
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R  =  (i\,r2,...,rN)  where  r.  e  {0,1}  for  i  =  1,2,..., N  with  N  representing  the  possible 

number  of  output  combinations  from  classifiers  as  shown  in  Table  2-3.  It  was  implicitly 
assumed  that  the  friend  rule  was  simply  the  complement  to  the  hostile  rule  or 

R  =  (r i,r2,..., rN)  where  r,  =  1  -  r  for  i  =  1,2,... ,N  .  In  later  sections,  this  will  be 
extended  to  a  non-declared  rule  requiring  the  notation  from  Ralston  (1998)  to  be 
expanded  as  follows.  Let  H  represent  the  hostile  rule  where  H  =  (iix ,  h2 , . . . ,  hN  )  with 

hj  g  {0,1} ;  let  F  denote  the  friend  rule  with  F  =  (fl,f2,...,fN)  and  f.  e  {0,1}  ;  further,  let 
ND  be  the  non-declared  rule  with  ND  =  (ndi,nd2,...,ndN)  and  ndj  g  {0,1}  .  These  rules 
are  mutually  exclusive  and  collectively  exhaustive  such  that  hj  +  f.  +  ndi  =  1  for 
all  i  =  1,2,..., N  .  This  notation  will  be  further  addressed  in  3.2.4. 

2.6  PNN  Fusion  Method 

A  Probabilistic  Neural  Network  (PNN)  is  a  useful  tool  proven  to  converge  to  the 
Bayesian  optimal  classifier  if  given  enough  data  for  training  (Wasserman,  1993).  The 
PNN  trains  very  quickly  and  is  robust  to  noise  (Wasserman,  1993).  The  amount  of 
computations  required  to  make  classifications  with  a  PNN  greatly  depend  on  the  size  of 
the  training  set  (Wasserman,  1993).  Figure  2-4  shows  a  probabilistic  neural  network. 
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Figure  2-4:  Probabilistic  Neural  Network  (Wasserman,  1993) 


Figure  2-4  shows  a  normalized  input  vector  X  =  (xl,x2,...,xn)  fed  to  the  distribution  layer 
of  a  PNN.  The  distribution  layer  is  a  connection  point  and  does  not  perfonn  any 
computations  (Wasserman,  1993).  The  weights  applied  to  each  distribution  layer  vector 
heading  into  any  one  pattern  layer  correspond  to  a  specific  training  vector.  The  pattern 
layer  computes  the  sum  of  the  weights  contributed  from  every  distribution  layer  neuron 
and  applies  to  it  a  non-linear  function  yielding  ZCI;  c  corresponds  to  the  particular  training 
vector  used  and  i  indicates  the  pattern  layer  involved  in  the  computation  (Wasserman, 


1993).  Each  Zci  is  formed  by  the  equation  Zci  =  exp 


(X'aXt-l) 

<j2 


where  XRi  denotes  the 


particular  training  vector  utilized  for  the  pattern  layer  considered  (Wasserman,  1993). 
The  summation  layer,  related  to  a  particular  class,  takes  all  ZCI  from  its  class  and 

(X‘  X  —  1) 

computes  Sc  =^]exp  - * -  (Wassennan,  1993).  For  the  PNN  displayed  in 
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Figure  2-4,  there  are  two  possible  classes  corresponding  to  the  two  summation  layer 
neurons.  The  output  from  the  summation  layers  are  then  compared  at  the  decision  layer 
yielding  a  one  if  Sa>  Sb  and  a  zero  if  the  opposite  is  true.  Designating  a  target  with  a  one 
labels  it  as  class  A.  The  link  to  more  classes  is  simply  the  addition  of  pattern  layer 
neurons  as  well  as  more  summation  layers;  the  determination  of  class  membership  is  then 
chosen  by  the  largest  summation  layer  value  (Wasserman,  1993). 

2.7  Data  Generation 


The  data  used  for  this  research  were  created  in  Leap  2004.  Several  different  data  sets 
were  created  to  test  a  broad  range  of  problems.  Two  discriminant  functions  were  utilized 
to  serve  as  classifiers,  linear  and  quadratic.  The  linear  classifier  was  created  with 
assumed  equal  covariance  and  equal  prior  probabilities  of  class  membership.  It  generated 
the  posterior  probabilities  of  class  membership  for  use  in  fusion.  The  equation  used  for 
the  posterior  probabilities  in  the  linear  classifier  was 


P(7Cj  \x0)  =  — — - - — - - — 

[~A— o  ~/f  ■  >'Z,  (Lo  -/0] 


The  quadratic  classifier  operated  under  the  assumptions  that  the  prior  probabilities  were 
equal,  but  the  covariance  matrices  could  not  be  assumed  equal.  The  posterior  probability 
equation  used  for  the  quadratic  classifier  was 
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Two  types  of  correlation  were  considered  to  match  possible  real  world  scenarios,  inter¬ 
correlation  and  intra-correlation.  Inter-correlation  describes  the  correlation  between  the 
features  of  a  data  set  (Leap,  2004).  For  instance,  the  height  and  width  of  a  particular 
target  may  have  a  strong  correlation;  the  length  of  a  T-72  tank  is  in  proportion  to  its 
width.  Table  2-4  demonstrates  inter-correlation. 


Table  2-4:  Correlation  table  from  Leap  (2004). 


Correlation  Correlation  Correlation 


Feature  1,  fi 

Feature  2,  f2 

Feature  3,  f3 

Feature  4,  f4 

Exemplar  1 

Exemplar  1 

Exemplar  1 

Exemplar  1 

Exemplar  2 

Exemplar  2 

Exemplar  2 

Exemplar  2 

■ 

■ 

■ 

■ 

Exemplar  N 

Exemplar  N 

Exemplar  N 

Exemplar  N 

The  next  type  of  correlation  addressed  is  intra-correlation  or  auto-correlation.  This 
explores  the  relationship  between  incoming  exemplars  in  a  given  feature.  Table  2-5 
depicts  the  autocorrelation  of  a  feature. 
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Table  2-5:  Correlation  table  from  Leap  (2004). 


Feature  1,  fi 


Correlation 

Correlation 

Correlation 

Correlation 


The  six  problems  considered  in  this  research  are  encapsulated  in  Table  2-6.  The  icon 
shown  for  each  problem  depicts  the  locations  of  data  centroids  for  the  different  classes 
used. 

Table  2-6:  Problem  Description  from  Leap  (2004). 


Problem  # 

Problem  Name 

Problem  Description 

1 

o* 

4  Feature  Case 

Recreates  Storm  work;  average  cost 
surface  of  5  runs  as  response 

2 

o# 

8  Feature  Case 

Adds  noise  and  redundant  features  to 
problem  1;  changes  mean  of  class  1 

3 

o* 

8  Feature  with 
Autocorrelation  Case 

Adds  autocorrelation  to  problem  2; 
changes  mean  of  class  1 

4 

o 

• 

8  Feature  Triangle  Case 

Changes  geometry  of  problem  2 

5 

• 

• 

8  Feature  XOR  Case 

Changes  geometry  of  problem  4 

6 

• 

• 

8  Feature  XOR  with 
Autocorrelation  Case 

Adds  autocorrelation  to  problem  5 

The  above  problems  are  all  variants  of  the  same  structure  of  correlation  matrices  and 
mean  vectors.  For  all  problems  considered,  F  =  Fl  x  F2  cz  R"  where  n  is  the  total 
number  of  features  considered  in  the  problem  (assumed  to  be  an  even  integer).  Thus, 


2-20 


Fl  c z  R 1,12  represents  the  features  observed  by  classifier  1,  the  linear  discriminant 
function  and  F2  c=  R"12  represents  the  features  observed  by  classifier  2,  the  quadratic 
discriminant  function.  Problems  5  and  6  created  a  space  that  was  too  difficult  for  the 
linear  and  quadratic  classifiers  to  discriminate.  Thus,  a  probabilistic  neural  network  will 
also  be  used  as  a  classifier  for  these  problems  and  fused  with  a  quadratic  function. 
Assuming  only  two  possible  classes,  class  0  and  class  1,  let  if 7  be  the  features  from 

feature  set  i  in  class  j  where  Fx=Ff  uFj1  and  F2  =  F2  u  F2  (Leap,  2004).  The  mean 
vector  for  feature  set  i  in  class  j  is  represented  by  juj .  The  correlation  of  the  data  is  given 


byZ 


'I  A 

Z 


f2,f i 


z; 

i: 


F2,Fi 


for  all  class  i  (Leap,  2004).  For  the  purposes  of  all  data  sets 


considered,  the  covariances  of  the  two  classes  were  set  equal  to  each  other  (  Z°=Z'>- 

The  correlation  between  and  within  feature  sets  was  created  through  the  use  of  different 
p  values  within  the  correlation  matrices.  Four  different  p  will  be  addressed: 
p,  pied ,  pind ,  and  pmt0 .  The  primary  correlation  is  included  through  p  .  This  correlation 
affects  the  correlations  between  features  and  p  e  {0.0, 0.2, 0.4, 0.6, 0.8, 0.9}  (Leap,  2004). 
The  correlation  due  to  the  addition  of  a  redundant  feature  is  attributed  to  pred  =  0.5; 
correlation  induced  by  the  addition  of  p  and  pK&  is  characterized  by  pind  =  p  x  pred ;  the 
correlation  within  a  feature  set  is  described  by  pauto  e  {0.0, 0.5, 0.9}  (Leap,  2004). 

Generating  multivariate  nonnal  data  with  autocorrelation  required  the  use  of  some 
equations  found  in  Laine,  2003.  Let  z(t)  where  t  e  {1,2,...,  A}  be  an  exemplar  of  the 
current  feature  space  with  N  being  the  total  number  of  exemplars  present  (Laine,  2003; 
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Leap,  2004).  The  distribution  of  each  exemplar  is  then  z(t)  ~  N(0,  ^  °)  (Laine,  2003; 


Leap,  2004).  Letting  A  =  pauto  *  I,B  =  (yjl-  p]uto )  *  /,  and  s(t)  ~  N(0,  {B  *  £  0  *B ))  for 
each  exemplar  t  allows  the  following  to  hold:  z(t)  =  A* z(t- 1)  +  <?(t)  (Laine,  2003; 
Leap,  2004).  Problems  5  and  6  were  too  difficult  for  a  linear  discriminant  function  to 
separate  and  they  were  revisited  using  a  PNN  and  quadratic  discriminant  function  as 
classifiers. 


2.7.1  Problem  1:  4  Feature  Case 


o 


This  is  the  only  problem  considering  4  features.  Let  F1  a  R2  be  the  features 
observed  by  classifier  1  and  F2  a  R2  be  the  features  observed  by  classifier  2.  It  follows 
that  F  =  Fl  x  F2  a  R4 .  The  data  was  created  such  that  the  features  within  individual 


feature  sets  were  statistically  independent  as  shown  by  V'  =V ’  = 


1  0 
0  1 


(Leap, 


2004).  The  correlation  matrix  between  features  in  F;,Fj  where  i  j  is  represented  by 


v  =y!'  = 

£-jfuf2  ^f2,f, 


0  p 

P  0 


where  p  e  (0.0, 0.2, 0.4, 0.6,0. 8,0.9}  (Leap  2004).  The  mean 


vectors  for  all  classes  and  feature  sets  are  as  follows: 


pl  =(0,0 )  ,  //,  =  (0.95,0.95)  , //2  =  (0,0)  , //2  =  (1.15,1.15)  .  The  feature  sets  for 


problem  1  are  distributed  as  follows: 


o 


2.7.2  Problem  2:  8  Feature  Case 

This  problem  adds  a  feature  vector  to  problem  1  as  noise  variables  and 
another  feature  vector  as  redundant  features.  The  class  1  mean  vector  is  also  changed  as 
reflected  below: 


Mi 


=  (0,0,0,0)r,//11  =  (0.5, 0.5, 0.5, 0)^°  =  (0,0,0, 0)7’,^  =  (0.75, 0.75, 0.75, Of.  The 


feature  vector  distributions  remain  unchanged  symbolically  with  the  only  changes 
occurring  due  to  the  altered  class  1  mean  vector  and  the  additional  features.  The 


correlation  due  to 


1  0  0 

0  1  Pred 

0  Pred  1 

0  0  0 


0 

0 

0 

1 


and 


0  P  Pind 

p  0  0 

Pind  0  0 

0  0  0 


0 

0 

0 

0 


comprises  the  overall  correlation  (Leap,  2004). 


The  pind  =  p  *  pred  comes  about  due  to  the  fact  that  feature  2  and  feature  3  are  correlated 

by  design  as  well  as  feature  6  and  feature  7  as  shown  by  and  j  r  .  The  addition 

of  p  between  features  1  and  6  causes  an  induced  correlation  between  features  1  and  7 
with  a  similar  occurrence  between  features  3  and  5  due  to  correlation  between  features  2 
and  5. 


2.7.3  Problem  3:  8  Features  with  Autocorrelation  Case 


o 


This  problem  adds  autocorrelation  to  problem  2.  The  mean  vectors  were  also 
varied  in  the  hopes  of  covering  a  broader  spectrum  of  problem  types.  For  this  problem, 
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the  mean  vectors  are  //,°  =  (0,0,0,0)r,  p\  =  (0.95, 0.95,0. 95,0)r,  //2°  =  (0,0,0,0)r,  and 
p\  =  (1. 15, 1.15,1. 15, 0)r  respectively.  The  covariance  matrices  remain  unchanged  from 
problem  2.  The  distributions  of  the  feature  vectors  are  the  same  symbolically  with  the 
only  change  present  due  to  the  new  mean  vectors.  The  levels  of  correlation  are  the  same 
as  problem  2  such  that  p  e  {0.0,0.2,0.4,0.6,0.8,0.9},  pred  =  0.95,  and  pind  =  p  *  pred 
(Leap,  2004).  Autocorrelation  is  made  present  within  each  feature  set  at 
Pauto  e  {0.0,0. 5,0.9}  covering  a  low,  medium,  and  high  setting  of  within  correlation. 

O 

2.7.4  Problem  4:  8  Feature  Triangle  Case  1-  ^ 

The  triangle  problem  varies  the  geometry  of  the  problem  to  consider  yet  another 
extended  environment.  There  are  four  multivariate  normal  populations  created,  two  from 
class  0  and  two  from  class  1,  with  three  different  mean  vectors  (Leap,  2004).  The 
problem  complexity  rises  making  it  a  little  more  difficult  to  separate  with  a  simple 
discriminant  function.  The  covariance  matrices  of  the  data  remain  the  same  based  on  2 
independent  features  in  each  class,  1  redundant  feature  and  1  noise  feature  independent  of 
all  other  features.  The  feature  vectors  of  these  new  data  sets  are  defined  to  be  F.jk  where 
i  is  the  feature  set  number,  j  is  the  class,  and  k  is  the  geometric  location  of  the  group  of 
features  (i  =  1,2,  j  =  0,1,  k  =  1,2)  (Leap,  2004).  For  instance,  i7,01  corresponds  to  the  first 
set  of  features  from  feature  set  1  in  class  0  and  F j02  corresponds  to  the  second  set  of 
features  from  feature  set  1  in  class  0.  Thus  the  feature  vectors  now  become 
F 7  =  F/1  u  F/2  and  Ft  =  F°  u  F.  (Leap,  2004).  The  feature  vectors  are  now 

distributed  as  F.Jk  ~  N(p’k jF  F )  for  all  i,  j,  k.  The  mean  vectors  reflect  the  same 
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symbolic  notation  and  are  changed  to  be  //“'  =  (0,0,0,0)r,  //“  =  (0. 95,0. 95,0. 95,0)r, 

//j11  =  (0.95, 0,0, 0)r,  //j12  =  (0.95,0,0,0)r  for  feature  set  1  and  =  (0,0,0,0)r, 

nf  =  (1.15, 1.15, 1.15, 0)T,  //"  =  (1.15, 0,0, 0)r,  //j2  =  (1.15,0,0,0)r  for  feature  set  2  (Leap, 

2004). 

•  o 

2.7.5  Problem  5:  8  Feature  XOR  Case  ^ 

This  again  alters  the  geometry  of  problem  3.  Problem  3  also  generated  four 
populations,  but  now  all  four  have  different  mean  vectors  making  a  linear  discriminant 
function  alone  useless.  As  shown  in  the  icon,  four  multivariate  nonnal  populations  will 
be  generated  with  equal  covariance  matrices  and  different  means.  All  of  the  specifics  to 
this  problem  are  the  same  as  the  triangle  problem  except  for  the  change  in  mean  vectors. 
The  mean  vectors  have  now  been  changed  to  the  following:  //,01  =  (0,0,0,0)r, 


//“2  =(0.95, 0.95, 0.95, 0)r,  //“  =  (0,0.95, 0.95, 0)r,  $  =  (0.95, 0,0, 0)r,  //201  =  (0,0,0,0)r, 
//202  =(1.1 5, 1.1 5, 1.15, 0)r,  //"  =(0,1.15,1.15,0)^,  and  //‘2  =  (1.1 5,0,0, 0)r  (Leap,  2004). 

2.7.6  Problem  6:  8  Feature  XOR  with  Autocorrelation  Case  M  ^ 


This  problem  aims  at  more  extended  environments  through  the  use  of 
autocorrelation  added  to  an  already  difficult  discrimination  problem.  The  XOR  case 
from  problem  5  above  is  now  altered  to  consider  correlation  within  a  feature  set  along 
with  correlation  across  features  as  visited  previously.  There  are  four  multivariate  nonnal 
populations  with  the  same  covariance  matrices  and  mean  vectors  as  in  problem  5.  The 
only  change  is  the  addition  of  autocorrelation.  The  addition  of  autocorrelation  is 
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accomplished  in  the  same  manner  as  problem  3.  The  within  correlation  is  once  again  set 
t0  Panto  e  {0.0,0.5,0.9} . 
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3.  Methodology 


3.1  Introduction 

This  chapter  details  how  the  research  was  conducted.  First,  the  experimental 
design  will  be  described  through  the  use  of  an  example  problem.  Next,  a  new  cost 
function  will  be  developed  for  total  cost  while  considering  non-declarations.  Then,  a 
heuristic  for  creating  a  non-declaration  rule  from  the  forced  decision  ISOC  fusion  method 
will  be  demonstrated.  This  forced  decision  heuristic  will  be  continued  to  the  ISOC  non¬ 
declaration  political  correctness  method.  The  boundary  rules  for  ISOC  fusion  will  be 
addressed  through  a  likelihood  ratio  heuristic.  Finally,  the  PNN  fusion  method  will  be 
revisited  to  include  the  new  total  cost  function. 

3.2  Experimental  Design 

This  research  effort  attempted  to  study  the  effects  of  correlation  and  autocorrelation 
on  classifier  fusion  when  non-declarations  were  introduced.  Two  true  states  of  nature 
were  considered,  hostile  and  friend,  with  equal  a  priori  probabilities  of  each  state.  Non¬ 
declarations  were  introduced  both  at  the  individual  classifier  level  as  well  as  at  the  fused 
classification  level.  At  the  classifier  level,  a  posterior  probability  threshold  of  T  =  0.5 
was  set  and  an  indifference  window  S',  was  introduced  to  allow  classifiers  to  non-declare 
exemplars  that  had  a  high  probability  of  misclassification;  the  indifference  window  was 
denoted(T  ±Sj)  where  =  (z-l)x  0.05  for  j  =  1,2  and  /  =  1,2, _ ,11 ;  denotes  the 

size  of  the  window  for  classifier  j  in  the  ith  configuration.  Thus,  exemplars  were 
classified  in  the  following  manner.  Let  Pk  be  the  posterior  probability  associated  with 
exemplar  k  being  a  true  hostile  where  k  =  1,2, ...#  exemplars ;  further,  let  “H”  denote  the 
times  when  an  exemplar  is  labeled  hostile,  “F”  indicate  a  friend  classification  and  “ND” 
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denote  undeclared.  When  i  =  1,  8)  =  0  and  the  classifier  j  labels  for  exemplar  k  were 
forced  to  “H”  or  “F”  based  on  the  threshold  T  and  any  exemplar  equal  to  T  was  declared 
“F”;  when  /  =  11,  8\ 1  =0.5  and  all  classifier  j  labels  were  forced  to  “ND”  with  no  hostile 


or  friend  declarations;  otherwise,  labels  were  determined  based  upon  the  following: 


if 


Pk  >  T  +  8 j,  then  the  exemplar  is  labeled"//" 
Pk  <T  -  8'j ,  then  the  exemplar  is  labeled "  F"  . 
else  the  exemplar  is  labeled  "ND" 


Figure  3-1  shows  an  individual  classifier  indifference  window.  The  posterior  probability  of 
being  in  a  given  class  resembles  normal  distributions  with  equal  variances,  but  this 
method  of  classification  could  be  performed  on  most  distributions. 
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Single  Sensor  Indifference  Window 


Figure  3-1:  Individual  Classifier  Indifference  Window. 

Non-declarations  were  also  allowed  from  the  fused  classifier  output  through  the 
heuristics  described  later  in  this  section.  For  the  PNN,  an  indifference  window  was 
introduced  after  the  features  had  been  fused.  Table  3-1  summarizes  the  design 
considerations  used  in  ISOC  forced  decision  (IFD),  ISOC  non-declaration  political 
correctness  heuristic  (INDPc)  and  PNN  fusion  with  non-declarations  NFnd. 
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Table  3-1:  Design  Considerations 


Problem  # 

Considerations 

#  Data  Sets 

Classifiers  Used 

1 

t 

0 

Sample  Size,  inter¬ 
correlation 

15;  3  generated  sets 
for  5  random  seeds 

Linear/Quadratic 

2 

0* 

Sample  Size,  inter¬ 
correlation 

15;  3  generated  sets 
for  5  random  seeds 

Linear/Quadratic 

3 

0* 

Sample  Size,  inter¬ 
correlation,  autocorrelation 

15;  3  generated  sets 
for  5  random  seeds 

Linear/Quadratic 

4 

0 

0# 

Sample  Size,  inter¬ 
correlation 

15;  3  generated  sets 
for  5  random  seeds 

Linear/Quadratic 

5 

•  0 
o* 

Sample  Size,  inter¬ 
correlation 

15;  3  generated  sets 
for  5  random  seeds 

Linear/Quadratic 

PNN/Quadratic 

6 

•  0 
0% 

Sample  Size,  inter¬ 
correlation,  autocorrelation 

15;  3  generated  sets 
for  5  random  seeds 

Linear/Quadratic 

PNN/Quadratic 

The  ISOC  non-declaration  likelihood  ratio  heuristic  (INDLr)  described  later  held  sample 
size,  problem,  and  correlation  levels  constant.  The  heuristic  was  meant  to  find  the 
optimal  rules  set  by  ratios  rather  than  complete  enumeration.  Table  3-2  summarizes  the 
considerations  for  the  second  heuristic. 


Table  3-2:  ISOC  Non-Declaration  Heuristic  (INDLR)  Considerations 


Problem  # 

Considerations 

#  Data  Sets 

Classifiers  Used 

3 

o* 

Sample  size  =  250, 

P  —  P  auto  —  0 

15;  3  generated  sets 
for  5  random  seeds 

Linear/ Quadratic 
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The  final  experiment  conducted  was  a  full  factorial  design  considering  the  potential 
interactions  between  subjective  costs  and  chosen  correlation  levels.  Table  3-3  summarizes 
the  design  considerations. 


Table  3-3:  RSM  Design  Considerations 


Problem  # 

Considerations 

#  Data  Sets 

Classifiers  Used 

3  0* 

Sample  size  =  250, 
p  =  0,0.8 ,  pauto  =  0,0.9  , 

CFP=  10,20,  CFN=  5,9, 

C  =14 

^ND  A’^ 

15;  3  generated  sets 
for  5  random  seeds 

Linear/ Quadratic 

3.2.1  Test  Problem 

An  easily  separable  problem  was  desired  to  test  the  methodology  before 
extending  to  the  more  difficult  problems  analyzed  in  chapter  4.  This  problem  will  be 
referenced  throughout  this  chapter  to  further  explain  methods  and  logic.  The  test 
problem  was  created  to  mirror  the  experimental  design  laid  out  above.  To  ensure  small 
sample  size  problems  were  not  encountered,  a  sample  size  of  100  exemplars  per  class 
was  chosen.  The  problem  was  designed  such  that 

ft=P  3].£,= 

This  geometry  created  a  problem  that  allowed  both  a  linear  and  quadratic  discriminant 
function  to  easily  separate  classes  to  a  reasonable  accuracy. 


,A2=[6  2.75], 2  2  = 


1  -1.5 

-1.5  3 
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3.2.2  ISOC  Forced  Decision  Fusion  (IFD) 

The  first  step  in  considering  non-declarations  was  to  allow  individual  classifiers 
to  label  exemplars  as  unknown.  In  order  to  model  non-declarations,  a  grid  of  possible 
indifference  windows  was  created  for  each  classifier  with  the  threshold  set  at  T  =  0.5. 

The  indifference  window  was  then  T  ±  8\  where  8\  denotes  the  size  of  the  window  for 

classifier  j  in  the  ith  configuration  and£'  =  (7-l)x0.05  for  j  =  1,2  and/  =  1,...,1 1 . 

Individual  classifier  labels  were  applied  to  posteriors  as  described  in  section  3.2  above. 
Once  all  exemplars  were  labeled  for  each  classifier,  the  labels  were  placed  in  a  combined 
classifier  performance  matrix  in  preparation  for  complete  enumeration  using  ISOC 
fusion.  Next,  all  possible  combinations  of  hostile  and  friend  rules  were  tested  and  fused 
indications  were  forced  to  be  “H”  or  “F”.  The  total  cost  equation  (2-1)  had  to  be 
modified  to  consider  the  probability  of  non-declarations  and  their  associated  cost.  This 
added  cost  would  be  a  constant  for  a  given  grid  point  of  (8[,82)  settings  and  would  only 
affect  the  overall  minimum  cost  when  compared  over  the  range  of {8[,8'2)  .  The  added 
cost  due  to  non-declarations  remained  a  constant  for  a  specified  grid  point  because  non¬ 
declarations  were  only  allowed  at  the  individual  classifier  level;  the  posterior  probability 
of  a  classifier  classifying  a  target  as  non-declared  was  used  in  the  calculation  for  total 
cost  under  the  ISOC  forced  decision  method.  Under  the  assumption  of  independence  of 
the  feature  sets  sent  to  each  classifier,  the  probability  of  non-declaration  can  be  calculated 
through  the  union  of  the  events  that  the  linear  function  classifies  the  target  as  unknown  or 
the  probability  that  the  quadratic  function  classifies  the  target  as  unknown.  Thus,  it  can 
be  shown  that 

P(ND)  =  P(NDl  )  +  P(ND2)-  P(NDl )  *  P(ND2 ) 
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where 


P(ND)  =  Probability  that  a  target  is  labeled  unknown  by  either  classifier 
NDj  =  a  target  labeled  non-declared  from  classifier  i 
P(NDi )  =  P(NDi  |  H)x  P(H)  +  P(NDi  \  F )  x  P(F )  for  classifier  i. 

Note  that  P(NDj)  and  P(ND2)  are  simply  the  probabilities  that  the  specified  classifier 
non-declares  an  exemplar  for  a  given  grid  point  (8{ ,  8\ )  .  There  are  no  non-declarations 
allowed  from  the  overall  fusion  process  at  this  point,  only  at  the  individual  classifier 
classification  level.  The  new  total  cost  function  in  equation  (3-1)  for  IFD  is  then  constant 
for  each  grid  point  location. 

CT  F  =  CFN  x  P(H )  x  P(FN)  +  CFP  x  P(F )  x  P(FP )  +  CND  x  P(ND)  (3- 1 ) 

Now  that  a  cost  function  is  created,  total  cost  must  be  calculated  across  all  grid  points  for 
all  possible  hostile  rules  x  S‘2  x  5 12  with  cf  denoting  the  size  of  the  indifference 

window  on  classifier  j  and  512  representing  the  total  number  of  hostile  rules  possible. 
Since  fused  non-declarations  were  not  allowed  from  the  IFD,  the  associated  friend  rule  to 
a  given  hostile  rule  is  simply  the  complement.  This  ISOC  forced  decision  process  yields 
a  set  of  optimal  hostile  and  friend  rules  based  on  minimizing  cost  for  each  grid  point. 

IFD  was  conducted  on  all  of  the  problems  shown  in  Table  3-1. 

3.2.3  ISOC  Non-Declaration  Heuristic  for  Political  Correctness  (INDPC) 

The  above  ISOC  forced  decision  method  was  extended  to  allow  a  fused  non- 
declared  indication.  Once  non-declarations  were  allowed  as  an  output  from  the  fusion 
process,  the  total  cost  function  from  equation  (3-1)  was  modified  to  be 
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c,  m  =  P(FN)  X  CFN  X  P(H)  +  P(FP)  X  CFP  x  P(F) 
+ cND  x  ;  x  P(H) + p(Ndf  )  x 


(3-2) 


where  CT  ND  =  Total  cost  for  the  rule  being  tested  allowing  non-declarations 
CFN  =  5,  Cost  of  a  false  negative 
P(H)  =  0.5,  a  priori  probability  of  a  target  being  hostile 

N 

P(FN)=  'YjPiS  j  I  H)  x  f.  =Probability  of  a  false  negative 

j= i 

=10,  Cost  of  a  false  positive 
P(F)  =  0.5,  a  priori  probability  of  a  target  being  a  friend 

N 

P(FP)  =  ^  P(Sj  |  F)  x  =  Probability  of  a  false  positive 

j= i 

=  1,  Cost  of  a  non-declaration 

N 

P(NDt )  =  y>Q!>,  T)  x  nd  j  =  probability  of  non-declaring  classifier  j  given  truth 

7=1 

Te{H,F} 

The  above  costs  were  set  by  a  subject  matter  expert;  they  will  be  used  throughout  this 
research  unless  otherwise  stated.  The  heuristic  in  Figure  3-2  was  developed  to  incorporate 
the  new  total  cost  function  and  find  an  optimal  rule  set  by  filtering  the  optimal  rules 
realized  through  ISOC  fusion  with  a  forced  decision. 
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Figure  3-2:  ISOC  Non-Declaration  Heuristic  INDPC- 
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The  first  step  in  the  heuristic  is  to  perform  complete  enumeration  of  ISOC  forced 


decision  fusion  as  explained  in  3.2.2  above.  This  results  in  an  optimal  minimum  cost  for 
each  grid  point  and  all  possible  rules  combinations  that  can  reach  that  minimum  cost. 
Table  3-4  shows  the  cost  results  for  step  1  of  the  example  problem. 


Table  3-4:  Example  Problem  Grid  Point  Min  Costs. 


5i 

52 

min(CT  FD) 

0 

0 

0.575 

0 

0.05 

0.59 

0.05 

0.1 

0.4341 

0.05 

0.15 

0.4085 

0.05 

0.2 

0.4231 

0.05 

0.25 

0.4878 

0.5 

0.45 

1.425 

0.5 

0.5 

2.5 

Once  the  global  minimum  cost  is  located,  step  2  can  be  perfonned.  This  global  minimum 
generally  yields  several  possible  hostile  and  friend  rule  combinations  for  the  specified 
grid  point.  These  alternate  optimals  are  found  because  different  combinations  of 
declarations  can  yield  the  same  cost.  These  rules  are  then  filtered  to  a  single  hostile, 
friend  and  non-declared  rule.  To  accomplish  this,  calculate  a  row  sum  for  all  of  the 
possible  hostile  rules  in  the  specified  grid  point.  Choose  the  hostile  rule  with  the  smallest 
row  sum  and  the  associated  friend  rule  (in  the  case  of  a  tie,  the  first  rule  with  the  smallest 
row  sum  is  selected).  The  elements  of  these  are  then  checked  to  ensure  they  are  practical. 
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Any  hostile  rule  elements  that  do  not  receive  at  least  one  “H”  output  from  a  classifier 
become  part  of  the  non-declared  rule;  the  friend  rule  is  filtered  into  a  friend  and  non¬ 
declaration  rule  in  the  same  manner,  ft  would  not  be  very  reassuring  to  have  a  fused  “H” 
declaration  resulting  from  two  classifiers  both  classifying  a  target  as  “F”.  You  could  call 
this  a  political  correctness  heuristic  where  the  elements  which  are  counterintuitive 
become  part  of  the  non-declared  rule.  Figure  3-3  illustrates  this  step  in  the  heuristic  for  the 
example  problem. 


Global  Minimum  Cost  Hostile  Rules 


Sensor  Indications  H-H  H-U  H-F  U-H  U-U  U-F  F-H  F-U  F-F  I 


Hostile  Rule  after  removing  elements  without  "H"  Friend  Rule  after  removing  elements  without  "F" 

H-H  H-U  H-F  U-H  U-U  U-F  F-H  F-U  F-F  H-H  H-U  H-F  U-H  U-U  U-F  F-H  F-U  F-F 


Figure  3-3:  Example  Problem  Rules  Selection. 


Now  that  the  preferred  rules  are  chosen,  CT  PC  must  be  tested  at  all  grid  points  to  locate 
the  optimal  settings  based  upon  the  rules  chosen.  Thus,  using  the  new  cost  function 
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CT  PC  above,  the  entire  grid  space  is  retested  and  new  costs  are  calculated.  The  costs 


achieved  through  the  heuristic  are  always  at  least  equal  to  and  most  often  improved  from 
the  forced  declaration  as  shown  in  Table  3-5. 


Table  3-5:  Example  Problem  Cost  Recalculation. 


5i 

52 

min(CT  FD) 

min(CT  PC) 

0 

0 

0.58 

0.58 

0 

0.05 

0.59 

0.58 

0.4 

0.2 

0.81 

0.26 

0.4 

0.25 

0.77 

0.25 

0.4 

0.3 

0.78 

0.25 

0.4 

0.35 

0.78 

0.26 

0.4 

0.4 

0.75 

0.25 

0.4 

0.45 

0.79 

0.26 

0.4 

0.5 

1.63 

0.71 

0.5 

0.45 

1.43 

0.63 

0.5 

0.5 

2.50 

1.00 

The  maximum  cost  shown  in  Table  3-5  for  ISOC  forced  decision  at  a  given  grid  point  is 
2.5  which  occurs  when  all  exemplars  are  classified  as  friends.  The  maximum  cost  for 
ISOC  non-declaration  political  correctness  heuristic  (INDpc)  is  1  occurring  when  all 
exemplars  are  given  non-declared  labels.  These  maximum  costs  represent  the  largest 
minimum  cost  observed  for  a  given  grid  point.  The  minimum  cost  is  bounded  for  a  grid 
point  by  these  maximum  costs. 
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3.2.4  Likelihood  Ratio  ISOC  Non-declaration  heuristic  (INDLr) 

The  heuristic  that  follows  is  an  extension  to  the  likelihood  ratio  method  described 
in  2.5.4.  This  method  attempts  to  find  the  optimal  set  of  hostile,  friend  and  non-declared 
rules  for  a  given  grid  point  (SVS2).  The  first  step  is  to  calculate  the  likelihood  of  being  a 
true  hostile  by  dividing  the  probability  of  being  a  true  hostile  over  the  probability  of 
being  a  true  friend  P(S  \  H)/  P(S .  |  F)  =  LRj .  One  difficulty  with  this  step  is  that  some 

elements  j  of  either  P(S  /  |  //)  or  P(S  /  |  F)  and  sometimes  both  might  equal  zero  creating 


either  a  divide  by  0  situation  or  an  indetenninate  form 


This  can  be  fixed  by  making 


the  following  assumptions.  The  first  is  that  any  time  P(Sf  \  H )  =  P(S t  \  F)  =  0 ,  this 

combination  of  outputs  from  the  classifiers  has  not  occurred  and  there  is  no  information 
as  to  which  declaration  would  make  the  most  sense.  In  these  cases,  it  is  assumed  that 
these  states  become  part  of  the  non-declared  rule  ND.  This  will  not  affect  the  value  of 
the  cost  function  while  ensuring  logical  declarations  are  made.  The  next  two  occurrences 
need  to  be  addressed  a  little  differently.  First,  consider  P(S ;  \H)  =  0  andF(5'/  |  F)  >  0 . 

In  this  case,  the  ratio  can  be  computed  and  the  LRj  =  0  for  all  such  occurrences.  This 
makes  state  j  highly  likely  of  being  a  friend  which  is  reasonable  since  there  is  no 
estimated  probability  that  this  occurrence  will  be  a  hostile.  But,  when  P(S  |  II)  >  0 

and  P(S  |  F)  =  0 ,  the  ratio  cannot  be  calculated.  It  seems  the  most  reasonable  label  for 

this  state  would  hostile.  This  was  accomplished  in  this  research  by  making  a  temporary 
matrix  of  P(S  |  F)  and  setting  all  states  with  zero  probability  of  being  a  friend  equal 

to  s ,  the  smallest  possible  number  in  Matlab.  This  caused  the  ratio  to  be  extremely  large 
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making  this  state  the  most  likely  to  be  hostile.  For  instance,  Table  3-6  shows  a  notional 
example  where  all  three  of  the  special  cases  detailed  above  occurred. 


Table  3-6:  Classifier  Performance  Matrix  Example. 


Classifier  1  Classifier  2 


State  j 

Indication 

Indication 

P(S|H) 

P(S|F) 

1 

H 

H 

0.46 

0.02 

2 

H 

U 

0.15 

0.06 

3 

H 

F 

0 

0.12 

4 

U 

H 

0.23 

0 

5 

U 

U 

0 

0 

6 

U 

F 

0.06 

0.09 

7 

F 

H 

0.06 

0.07 

8 

F 

U 

0.03 

0.15 

9 

F 

F 

0.01 

0.49 

Let  LRj  =  LRj  [(]  represent  the  likelihood  ratio  for  state  j  with  [i]  denoting  the  ith  order 

statistic  for  LRj.  The  output  state  j  =  4  (U-H)  where  classifier  1  declared  the  target  as 
unknown  and  classifier  2  declared  it  hostile  only  occurred  on  exemplars  that  were  true 
hostiles.  The  likelihood  ratio  for  this  state  was  calculated  by  using  s  such  that 
LR4  =  LR4[l]  became  the  largest  ordered  likelihood  ratio.  On  the  other  hand,  for  output 

state  j  =  3  (H-F)  when  classifier  1  declared  the  exemplar  hostile  and  classifier  2  declared 
the  exemplar  friend,  there  were  no  occurrences  of  this  state  when  an  exemplar  was  a  true 
hostile.  The  corresponding  likelihood  ratio  became  LRi  =  LR}  [g]  tying  state  6  for  the 

smallest  likelihood  ratio  of  being  hostile.  There  were  no  instances  in  which  state  5  (U-U) 
occurred  so  its  likelihood  ratio  was  LR5  =  LR5  [9]  and  it  will  become  part  of  the  non- 

declared  rule.  Once  the  states  with  no  recorded  instances  of  being  friend  were  changed 
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to  a ,  the  likelihood  ratio  could  be  calculated  and  sorted  from  largest  to  smallest.  Table  3-7 


shows  the  sorted  likelihood  ratios  from  the  notional  example  above.  Note  that  the 


temporary  ^  P(S .  \  F)  ^  1 ,  but  this  is  only  used  as  a  proxy  to  calculate  the  likelihood 


ratios. 


Table  3-7:  Sorted  Likelihood  Ratios  Example. 


State  j 

Classifier  1 

Indication 

Classifier  2 

Indication 

P(S|H) 

P(S|F) 

temp  P(S|F) 

LRj 

4 

U 

H 

0.23 

0 

e  =  IE-210 

2.3E+209 

1 

H 

H 

0.46 

0.02 

0.02 

23 

2 

H 

U 

0.15 

0.06 

0.06 

2.5 

7 

F 

H 

0.06 

0.07 

0.07 

0.857143 

6 

U 

F 

0.06 

0.09 

0.09 

0.666667 

8 

F 

U 

0.03 

0.15 

0.15 

0.2 

9 

F 

F 

0.01 

0.49 

0.49 

0.020408 

3 

H 

F 

0 

0.12 

0.12 

0 

5 

U 

U 

0 

0 

e  =  IE-210 

0 

Now  that  the  states  have  been  sorted  according  to  their  likelihood  ratios  (hostile  to 
friend),  the  total  cost  of  misclassification  can  be  calculated;  the  likelihood  ratio  of  being  a 
friend  is  calculated  by  P(S  \  F)I P(S .  |  II) .  The  total  cost  function  from  equation  (3-2) 

applies  to  this  heuristic.  For  the  example  given  in  this  section,  the  costs  and  prior 
probabilities  were  assumed  to  be  the  following:  CFn  =  5,  CFp  =  10,  P(H)  =  P(F)  =  0.5  . 

Let  Pcum(S[j]  |  II  )  represent  the  cumulative  probability  of  a  state  given  hostile  based  on 

the  likelihood  ratio  order  statistics  above.  The  probabilities  for  false  positives  and  false 
negatives  were  calculated  as  follows:  P(FN[J])  =  (I  -  Pum(Sin  |  //))  and 

P(FP[n)  =  Pcum(S[n  |  F)  where  j  represents  state  j  of  the  classifier  performance  matrix  and 
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Pcum  (£■[  .]  |  T)  is  the  cumulative  probability  of  state  [j]  given  truth  T  e  {H,F} .  The  costs 
associated  with  the  sorted  classifier  perfonnance  matrix  are  shown  in  Table  3-8. 


Table  3-8:  Sorted  Classifier  Performance  Matrix  Costs. 


State  j 

Classifier  1 

Indication 

Classifier  2 

Indication 

P(S|H) 

P(S|F) 

Pcum(S|H) 

Pcum(S|F) 

Ct 

mm 

U 

H 

0.23 

0 

0.23 

0 

1.925 

H 

H 

0.46 

0.02 

0.69 

0.02 

0.875 

H 

U 

0.15 

0.06 

0.84 

0.08 

0.8 

F 

H 

0.06 

0.07 

0.9 

0.15 

1 

6 

U 

F 

0.06 

0.09 

0.96 

0.24 

1.3 

8 

F 

U 

0.03 

0.15 

0.99 

0.39 

1.975 

9 

F 

F 

0.01 

0.49 

1 

0.88 

4.4 

H 

F 

0 

0.12 

1 

1 

5 

U 

U 

0 

0 

1 

1 

5 

The  total  cost  of  misclassification  (Ct)  in  Table  3-8  shows  the  top  three  states  are  to  be 
entered  in  the  hostile  rule  leaving  the  remaining  states  for  declaration  as  friend.  This  is 
the  least  expensive  set  of  rules  possible  while  only  declaring  targets  friend  or  hostile 
(currently  nd  j  =  0  for  all  j  =  1,2,. .  ,,N  ).  The  heuristic  to  this  point  has  followed  the  logic 

of  Ralston  (1998).  From  the  above  example,  the  associated  hostile  and  friend  rules  are 
shown  in  Table  3-9. 
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Table  3-9:  Optimal  Hostile  and  Friend  rules. 


Classifier  1  Classifier  2 


State  j 

Indication 

Indication 

P(S|H) 

P(S|F) 

Cj 

Hostile 

Friend 

H 

F 

mm 

U 

H 

0.23 

0.00 

1.93 

1 

0 

h[i] 

f[i] 

H 

H 

0.46 

0.88 

1 

0 

h[2] 

f[2] 

H 

U 

0.15 

0.80 

1 

0 

hp] 

f[3] 

7 

F 

H 

0.06 

0.07 

1.00 

0 

1 

h[4] 

f[4] 

6 

U 

F 

0.06 

0.09 

1.30 

0 

1 

h[5] 

f[5] 

8 

F 

U 

0.03 

0.15 

1.98 

0 

1 

h[6] 

f[6] 

9 

F 

F 

0.01 

0.49 

4.40 

0 

1 

h[7] 

f[7] 

3 

H 

F 

0.00 

0.12 

5.00 

0 

1 

h[8] 

f[8] 

5 

U 

U 

0.00 

0.00 

5.00 

0 

1 

h[9] 

f[9] 

Notice  that  the  state  when  both  classifiers  non-declare  a  target  (U-U)  falls  into  the  friend 
rule  for  now,  but  it  accounts  for  zero  cost.  This  element  will  become  part  of  the  non- 
declared  rule  at  the  end  of  this  heuristic.  The  double  line  in  the  table  shows  where  the 
decision  to  add  or  remove  the  next  state  of  the  hostile  or  friend  rules  would  increase  the 
total  cost.  Now  that  the  baseline  cost  has  been  detennined,  the  heuristic  for  finding  the 
best  combination  of  hostile,  friend  and  non-declared  rules  can  be  utilized.  This  is 
accomplished  by  testing  the  removal  of  different  states  from  both  the  hostile  and  friend 
rules  for  use  in  the  non-declared  rule.  Hostile  rule  elements  hy]  will  be  removed  in 
sequence  and  cost  will  be  calculated  with  element  [j]  being  part  of  the  non-declaration 
rule  based  upon  their  likelihood  ratios.  For  simplicity,  define  a  new  index  for  rule 
elements  based  on  their  order  statistic  from  the  likelihood  ratio  calculations.  Define 
H,F,  and  ND  as  the  hostile,  friend  and  non-declared  rules  respectively.  Let  hpq  be  the 
kth  ordered  element  of  the  hostile  rule  //where  TCM  is  at  a  minimum  and  f[m]  be  the 
(k  +  l)  v'  ordered  element  of  F  based  on  the  order  statistic  index  where 

[k]  =  arg m i n )  and  [in]  =  argma x(Z/?['/])  .  The  element  hy]  associated  with  the 

i  ./I-"  !./>/•' 
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smallest  likelihood  ratio  will  be  the  first  candidate  for  removal;  likewise,  the  first  friend 
element  tested  is  based  on  the  element  fyj  =  1  associated  with  the  largest  likelihood  ratio. 
Let  Cn  represent  the  temporary  total  cost  of  misclassification  i  calculated  using  equation 
(3-2).  The  ordered  element  contributing  the  largest  decrease  in  total  cost  will  be  included 
in  the  non-declared  rule.  This  process  will  be  iterated  until  either  nd^j  =  1  for  ah  [j]  or 
the  addition  of  another  element  to  the  non-declared  rule  would  increase  the  minimum 
total  cost.  From  the  example  in  Table  3-9,  the  first  hostile  element  to  be  tested  is  h[3] 
where  h[3]  =  0  and  nd[3]  =  1 ;  the  first  friend  element  for  comparison  is  f[4]  where  f[4]  -  0 
and  nd[4]  =  1 .  The  total  cost  function  will  decide  whether  either  state  should  remain  part 
of  the  non-declared  rule  or  not.  The  cost  calculations  in  the  algorithm  that  follows  are 
based  upon  equation  (3-2).  The  non-declared  rule  is  incremented  as  follows: 

1.  Create  non-declared  rule  initialized  to  zero,  ND  =  [0,0,...,0]r 

2.  Find  Ct,lr  from  equation  (3-2) 

3.  Test  h[k]  element  of  the  hostile  rule  versus  fm  element  of  the  friend  rule 

a.  Set  h[k]  =  0,  ndpq  =  1 

b.  Calculate  Cxi  from  equation  (3-2) 

c.  Set  h[k]  =  1,  nd[k]  =  0,  f[m]  =  0,  nd[m]  =  1 

d.  Calculate  Cx2  from  equation  (3-2) 

e.  Set  nd[m]  0,  f[m]  1 

4.  If  min(Cxi,Cx2)  <  Cx,lr 

a.  If  Cxi  ^  Cx2 
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i.  Set  nd[k]  -  1,  h[k]  -  0,  Ct,lr  -  C™,  Cxi  -  0,  [k]  —  [k]  —  1 


ii.  Return  to  step  3 

5.  Else  if  Cx2<  CTi 

a.  Set  nd[m]  =  1,  f[m]  =  0,  Ct,lr  =  Ci2,  Ci2  =  0,  [m]  =  [m]  +  1 

6.  Return  to  step  3 

Else  if  CTLR  <min(CT1,CT2) 

If  any  P(Sy  \  H)  =  P(Sj  |  F)  =  0 ,  ndj  =  1 ,  h  j  =  1 ,  fj  =  1 
End 

Once  the  above  algorithm  is  completed,  the  expected  minimum  cost  rule  set  for  the 
specified  grid  point  is  created.  The  total  cost  found  from  this  method  will  be 
denoted  CT  LR  .  This  method  could  be  implemented  to  the  non-declaration  heuristic  in 

Figure  3-2  starting  at  step  2  to  find  the  global  minimum  cost  rules.  The  flowchart  in  Figure 
3-4  further  encapsulates  the  likelihood  ratio  process  for  a  given  grid  point. 
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Calculate  P(Sj  |  H)  and  P(Sj  |  F) 

j 

If  any  P(Sj  |  F)  =  0,  Set  Ptemp(Sj  |  F)  =  e 

i 

LR  =  P(S  |  H)  /  Ptemp(S  |  F) 


Sort  LRjfrom  largest  to  smallest,  LRJ  =  LRj{i 


Calculate  Pcum(Sj  |  H),  Pcum(Sj  |  F) 

i 

Calculate  Cj  for  all  j,  C[k]  =  min(C)  =  CT,  [m]  =  [k]  +  1 


Figure  3-4:  ISOC  Fusion  rule  selection  through  likelihood  ratios. 


3.2.5  RSM  Cost  experiment 

It  became  readily  apparent  that  there  were  some  subjective  areas  of  this  research 
that  could  affect  optimal  rules  and  costs.  This  subjective  nature  was  tested  through  the 
use  of  a  response  surface  methodology  experiment.  The  costs  of  misclassification  were 
based  on  a  best  guess,  but  changes  could  drastically  affect  the  outcome  of  the  different 
experiments.  These  costs  and  their  interactions  with  correlation,  both  intra-correlation 
and  inter-correlation,  were  considered.  The  sample  size  was  held  constant  at  250 
exemplars  per  class.  In  addition,  the  grid  points  tested  were  8\  =  (z  - 1)  x  0. 1  for  z  =  1,2 
and  j  =  1,...,6 .  This  was  considered  enough  grid  points  to  get  a  clear  picture  of  the 
sample  space.  The  optimal  rules  for  each  grid  point  were  calculated  using  the  likelihood 
ratio  heuristic  (INDLr);  these  rules  remained  relatively  constant  across  all  grid  points. 

The  problem  was  designed  as  a  full  factorial  with  five  factors.  The  ranges  for  the 
variables  are  as  follows:  pintcr  e  {0,0.8}  ,  pintn  e  {0,0.9} ,  CFP  e  {10,20} ,  CFN  e  {5,9}  , 

CND  e  {1,4}  .  The  ranges  for  the  costs  were  designed  to  ensure  the  following  generally 
accepted  inequality:  CFP  >  CFN  »  CND .  The  experimental  response  variable  was  Ct,lr. 

3.2.6  PNN  Non-Declaration  Fusion  Method  (NFnd) 

The  PNN  fusion  method  takes  the  posteriors  output  from  the  classifiers  and  uses 
them  as  features.  One-third  of  the  posterior  probabilities  are  fed  to  the  PNN  from  the  test 
set  to  train  the  network.  The  validation  set  plus  the  remaining  two-thirds  of  the  test  set 
exemplars  are  used  for  validation.  The  spread  parameter  was  tested  to  achieve  the 
highest  possible  accuracy.  The  outputs  from  the  PNN  were  then  put  through  an 
additional  radial  basis  function  and  turned  into  posterior  probabilities  using  Baye’s  rule. 
Decisions  for  classifications  were  once  again  made  based  on  T  ±  S'  although  there  is 
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now  only  one  indifference  window  (S'  =  ( i  - 1)  x  0.5  for  i  =  1,...,1 1)  under  consideration. 


As  stated  earlier,  if< 


Pk  >T  +  S' ,  then  the  exemplar  is  labeled"//" 

Pk  <T  -S' ,  then  the  exemplar  is  labeled  "i7"  for  all  i  and  k.  Some 
else  the  exemplar  is  labeled  "ND" 


validation  set  exemplars  were  such  outliers  that  their  associated  PNN  activations  were 
zero.  These  were  automatically  assigned  equal  posterior  probabilities  of  class 
membership,  Pk  =  0.5  .  When  i  =  1 ,  S'  =  0  and  these  exemplars  were  forced  to  be  “F” 
classifications  since  the  cost  of  a  false  negative  was  considered  to  be  more  acceptable 
than  the  cost  of  a  false  positive.  When  i  >  1 ,  S'  >0  and  these  exemplars  with  equal 
posterior  probabilities  fell  within  the  indifference  window  and  classified  as  “ND”.  Note 
that  this  method  allows  “H”,  “F”  and  “ND”  indications.  Figure  3-5  shows  the  PNN  non¬ 
declaration  classifier  fusion  process  NFnd  as  used  in  this  research. 


3-22 


PNN  Sensor  Fusion  Process 


Figure  3-5:  PNN  Classifier  Fusion  Process  (NFnd). 

The  total  cost  of  misclassification  is  again  used  as  a  cost  function  and  allows  a 
comparison  between  methods  considered  in  this  research.  Total  costs  are  compared  in 
the  results  section  to  find  optimal  indifference  windows  for  particular  runs.  Accuracy 
and  the  percent  of  non-declared  will  also  be  addressed. 
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3.3  Summary 


This  chapter  introduced  the  framework  for  this  research.  The  experimental  design 
was  explained  as  laid  out  in  Leap  (2004).  The  forced  decision  ISOC  method  was 
introduced  as  well  as  a  heuristic  to  adapt  the  forced  decision  ISOC  to  incorporate  a  non- 
declared  rule.  The  ISOC  method  using  likelihood  ratios  was  described.  Next,  the  RSM 
experiment  considering  cost  ranges  and  their  interaction  with  correlation  levels  on  TCM 
was  introduced.  Finally,  the  PNN  fusion  method  was  discussed.  Table  3-10  shows  the 
methods  introduced  in  this  chapter  with  their  associated  acronyms  and  cost  labels. 


Table  3-10:  Acronyms  and  Costs. 


Method 

Acronym 

Cost 

ISOC 

ISOC 

Ct.i 

ISOC  Forced  Decision 

IFD 

Ct.fd 

ISOC  Non-Declarations  "Political  Correctness" 

INDpc 

Cj,pc 

ISOC  Non-Declarations  Likelihood  Ratio 

indlr 

Ct,lr 

PNN  Non-Declarations 

nfnd 

Ct.nnd 
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4.  Findings  and  Analysis 


4.1  Introduction 

In  this  chapter,  the  three  ISOC  fusion  heuristics  developed  in  Chapter  3  were 
compared  and  contrasted  to  the  PNN  fusion  method  by  their  classification  accuracies  and 
costs  found  by  executing  the  problems  described  in  Table  2-6.  Findings  are  also 
introduced  for  the  RSM  study  investigating  the  interactions  between  cost  and  correlation. 

4.2  General  Findings 

After  a  thorough  investigation  of  the  different  methods,  some  common  results  were 
found.  The  first  was  that  there  was  a  consistent  relationship  between  the  total  costs  of 
misclassification  achieved.  It  will  be  shown  later  in  this  section  that 
CT  FD  >  CT  PC  >  CT  NND  in  a  statistically  significant  sense  where  CT  FD  is  the  TCM  for  the 

ISOC  forced  decision  method  (IFD),  CT  PC  is  the  TCM  associated  with  the  ISOC  non¬ 
declaration  political  correctness  heuristic  (INDPc),  and  CT  NND  represents  the  TCM  for 

the  PNN  non-declarations  fusion  method  (NFnd).  The  second  result  displayed  the 
difference  between  method  assumptions;  the  ISOC  methods  assume  independence  and  do 
not  react  much  to  correlation  while  neural  networks  such  as  PNN  fusion  methods  do  not 
assume  independence  and  react  accordingly.  This  further  supports  the  research  of  Stonn 
et  al.,  (2003)  and  Leap  et  al.,  (2004).  It  was  discovered  in  problems  5  and  6  that  the 
classifiers  chosen  were  unable  to  adequately  label  exemplars  creating  disconcerting 
findings.  The  next  sections  of  this  chapter  will  step  through  each  problem  individually 
and  further  explore  the  analysis  performed  for  this  research  effort. 
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4.3  Problem  1  Results:  4  Feature  Case 

This  data  set  was  applied  to  the  fusion  processes  described  in  Chapter  3.  Problem  1 
costs  were  consistent  with  the  above  inequality  CT  FD  >  CT  PC>  CT  NND  .  Problem  1 

demonstrated  the  INDPc  ability  to  reach  a  common  optimum  hostile  rule  as  sample  size 
was  increased  for  a  set  correlation p  .  Table  4-1  shows  the  optimal  ISOC  non-declaration 
political  correctness  heuristic  optimal  hostile  rules  for  a  set  sample  size  of  25  exemplars 
per  class  and  p  =  0 .  The  indications  are  from  classifier  1  (linear  discriminant  function) 
and  classifier  2  (quadratic  discriminant  function)  (L-Q),  respectively.  The  run  denotes 
the  different  random  seeds  used. 


Table  4-1 :  INDPC  Optimal  Hostile  Rules  (Sample  Size  =  25,  p  =  0). 
Indications  (Linear  -  Quadratic) 


Run 

H-H 

H-U 

H-F 

U-H 

u-u 

U-F 

F-H 

F-U 

F-F 

mm 

1 

1 

0 

0 

0 

0 

0 

0 

0 

1 

1 

0 

0 

0 

0 

1 

0 

0 

1 

0 

1 

1 

0 

0 

0 

0 

0 

m 

1 

1 

0 

0 

0 

0 

0 

0 

0 

HI 

1 

1 

1 

0 

0 

0 

1 

0 

0 

The  hostile  rules  above  are  variable  across  random  number  streams.  As  sample  size  is 
increased,  the  optimal  rule  set  becomes  constant.  In  fact,  using  a  sample  size  of  500 
exemplars  per  class,  the  optimal  hostile  rules  for  INDpc  are  consistent  only  declaring  a 
target  as  hostile  if  both  classifiers  indicate  that  it  is  hostile.  Table  4-2  compares  the 
optimal  hostile  rules  over  5  random  seeds  with  a  sample  size  of  500  exemplars  per  class. 
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Table  4-2:  INDPC  Optimal  Hostile  Rules  (Sample  Size  =  500,  p  -  0). 


Indications  (Linear  -  Quadratic) 


Run 

H-H 

H-U 

H-F 

U-H 

u-u 

U-F 

F-H 

F-U 

F-F 

mm 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

II 

1 

0 

0 

0 

0 

0 

0 

0 

0 

■9 

1 

0 

0 

0 

0 

0 

0 

0 

0 

The  same  results  were  noticed  when  varying  sample  size  for  p  =  0.9 .  Table  4-3  shows  the 
variable  hostile  rules  found  for  a  sample  size  of  25  exemplars  per  class. 


Table  4-3:  INDPC  Optimal  Hostile  Rules  (Sample  Size  =  25,  p  =  0.9). 
Indications  (Linear  -  Quadratic) 


Run 

H-H 

H-U 

H-F 

U-H 

u-u 

U-F 

F-H 

F-U 

F-F 

1  1 

1 

0 

0 

0 

0 

0 

0 

0 

0 

mm 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

1 

0 

0 

H 

1 

1 

0 

0 

0 

0 

1 

0 

0 

■9 

0 

1 

0 

1 

0 

0 

1 

0 

0 

Some  of  the  most  obvious  states  are  confused  with  the  addition  of  inter-correlation.  The 
output  state  when  both  classifiers  declare  a  target  as  hostile  is  not  included  in  some  of  the 
optimal  hostile  rule.  Once  sample  size  is  sufficiently  increased  to  500  exemplars  per 
class,  the  rules  stabilize  to  two  common  optimal  hostile  rules  as  shown  in  Table  4-4. 
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Table  4-4:  INDPC  Optimal  Hostile  Rules  (Sample  Size  =  500,  =  0.9). 
Indications  (Linear  -  Quadratic) 


Run 

H-H 

H-U 

H-F 

U-H 

u-u 

U-F 

F-H 

F-U 

urn 

1  1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

n 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

19 

1 

0 

0 

0 

0 

0 

1 

0 

0 

H 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Thus,  it  can  be  inferred  that  as  sample  size  increases,  the  optimal  hostile  rule  becomes 
more  invariable  for  INDpc.  There  were  some  volatile  states,  but  overall  the  rule  steadied 
itself.  The  other  problems  also  demonstrated  this  characteristic. 

The  indifference  windows  became  more  stable  as  sample  size  increased  in  this 
problem.  This  is  a  useful  result  to  locate  optimal  grid  point  settings  for  the  chosen  rule 
set.  Figure  4-1  shows  a  histogram  of  8X  (indifference  window  on  classifier  1)  varied  from 
sample  sizes  of  25  to  500.  The  optimal  indifference  window  falls  around  0.3  with  high 
sample  size.  The  histogram  of  sample  size  25  is  much  more  difficult  to  detennine  an 
optimal  grid  point  setting. 


4-4 


Sample  Size  =  25 


Sample  Size  =  500 


o.i 


0.2  0.3  0.4  0.5 


Indifference  window  § 


0.5 


Indifference  window  8 

1 


Figure  4-1:  Indifference  Window  Comparison. 


This  problem  was  relatively  easily  separable  and  as  a  result,  the  INDpc  (ISOC  non¬ 
declaration  political  correctness  heuristic)  was  comparable  to  the  PNN  fusion  NFnd  and 
there  was  no  statistical  difference  between  the  two  methods.  Table  4-5  shows  the  paired  t- 
test  conducted  on  the  difference  in  mean  costs  between  the  different  methods  while 
varying  sample  size  and  p  .  Ct,fd  represents  the  costs  achieved  through  the  ISOC  forced 
decision  heuristic;  Ct,pc  represents  the  costs  from  ISOC  non-declaration  political 
correctness  heuristic  and  Ct,nnd  accounts  for  the  total  cost  of  misclassification  found 
through  PNN  fusion  allowing  non-declarations.  The  highlighted  rows  in  Table  4-5 
represent  the  only  difference  in  mean  costs  that  were  not  rejected.  All  other  rows  in  the 
table  were  statistically  different  at  significance  level  a  =  0.05  . 
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Table  4-5:  Paired  t-tests  of  Difference  in  Mean  Costs. 


1  Methods 

Samole  Size 

£ 

o-value 

lower  Cl 

E7333 'EM 

t-statistic 

t-critical 

All 

All 

0.006 

0.0073 

Inf 

2.5153 

±  1.9622 

Wfd^t.pc 

25 

0.0 

0 

5.9 

±  2.0017 

Wfd^t.pc 

25 

0.9 

0 

4.4 

±  2.0017 

Wfd^t.nnd 

25 

0.0 

0 

0.9 

5.9 

±  2.0017 

Wfd^t.nnd 

25 

0.9 

0 

0.5 

1.2 

5.1 

±  2.0017 

^T,PC"^T,NND 

25 

0.0 

0.6 

-0.1 

0.2 

0.5 

±  2.0017 

*-T,PC"^T,NND 

25 

0.9 

0.1 

0 

0.3 

1.6 

±  2.0017 

CpFD'fo.PC 

500 

0.0 

0 

0.5 

0.6 

14.4 

±  2.0017 

Wfd'Ct.pc 

500 

0.9 

0 

0.6 

0.9 

12.0 

±  2.0017 

Wfd^t.nnd 

500 

0.0 

0 

0.5 

0.6 

14.4 

±  2.0017 

*-'t,fd"Wnnd 

500 

0.9 

0 

0.6 

0.9 

11.4 

±  2.0017 

t'T,PC"t'T,NND 

500 

0.0 

0 

0.1 

0.1 

8.7 

±  2.0017 

C"T,PCffo,NND 

500 

0.9 

0 

-0.1 

0.0 

-2.4 

±  2.0017 

Note  that  INDpc  was  statistically  less  than  NFnd  for  sample  size  of  500  and  p  =  0.9  as 
shown  in  the  last  row  of  the  table  although  this  is  a  small  difference  in  a  practical  sense. 
There  were  instances  such  as  this  when  the  ISOC  heuristics  were  able  to  outperform  the 
neural  networks,  but  in  general  the  opposite  remained  true. 

Cost  and  accuracy  were  compared  through  the  use  of  a  parametric  analysis.  The 
parametric  analysis  compared  average  costs  and  accuracies  across  all  sample  sizes  for 
ISOC,  IFD,  INDpc,  NF,  and  NFNd-  Since  lower  costs  were  preferred,  the  average  costs 


were  transformed  by  C,  = 


(i-c,) 

max(C;) 

i 


Each  method  generates  a  score  through  the 


following  equation:  (1  -  a)  x  C,-  +  a  x  Ai  where  At  represents  the  average  accuracy  for 
method  i  and  a  is  varied  along  the  x-axis  to  compare  the  methods.  When  a  =  0,  cost  is 
the  most  important  factor  and  when  a  =  1,  accuracy  becomes  the  most  important. 
Otherwise,  a  represents  the  weighting  for  the  two  factors  being  compared.  Figure  4-2 
displays  these  average  scores  varied  across  a  for  no  correlation. 
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All  Methods  Parametric  Analysis 


Figure  4-2:  Parametric  Analysis  of  Cost  and  Accuracy  (p=  0.0). 

INDpc  outperforms  all  other  methods  across  all  weightings  a,  although  there  is  no 
statistical  difference  between  INDpc  and  NFnd-  The  above  analysis  varies  when 
correlation  increases.  Figure  4-3  shows  the  five  methods  with  high  levels  of  correlation. 
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All  Methods  Parametric  Analysis 


Figure  4-3:  Parametric  Analysis  of  Cost  and  Accuracy  (p  =  0.9). 


The  above  parametric  analysis  shows  that  the  PNN  fusion  methods  react  to  changes  in 
levels  of  correlation  where  the  ISOC  fusion  methods  do  not.  INDpc  is  now  statistically 
better  than  all  other  methods  across  all  weightings  of  cost  and  accuracy  for  problem  1 . 


4.4  Problem  2  Results:  8  Feature  Case 


o 


This  data  set  was  applied  to  the  fusion  processes  described  in  chapter  3.  This 
problem  will  be  used  to  show  the  cost  inequality  described  in  the  general  findings  section 
above  ( CT  FD  >  CT  PC  >  CT  NND  ).  Table  4-6  shows  the  costs  as  sample  size  and  correlation 


were  varied. 
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Table  4-6:  ISOC  Heuristics  Vs.  NFnd  Costs. 


Run 

Sample  Size 

P 

Ct,fd 

Ct.pc 

Ct.nnd 

1 

25 

mum 

0.75 

1 

25 

0.20 

wEm 

0.69 

1 

25 

0.40 

2.50 

0.92 

0.35 

1 

25 

0.60 

1.96 

1.00 

0.67 

1 

25 

0.80 

2.32 

1.00 

0.77 

1 

25 

0.90 

2.26 

0.90 

1.00 

1 

50 

0.00 

1.83 

0.78 

0.73 

1 

50 

0.20 

1.89 

0.95 

0.60 

1 

50 

0.40 

2.00 

0.92 

0.69 

1 

50 

0.60 

2.50 

0.94 

0.72 

1 

50 

0.80 

2.24 

0.90 

0.90 

1 

50 

0.90 

2.25 

0.90 

0.87 

5 

500 

0.80 

2.28 

0.97 

0.78 

5 

500 

0.90 

2.24 

0.93 

0.87 

5 

1000 

0.00 

1.88 

0.95 

0.66 

5 

1000 

0.20 

1.96 

0.94 

0.67 

5 

1000 

0.40 

2.15 

0.94 

0.75 

5 

1000 

0.60 

2.07 

0.93 

0.78 

5 

1000 

0.80 

2.20 

0.93 

0.81 

5 

1000 

0.90 

2.33 

0.95 

0.83 

Notice  that  the  above  inequality  holds  for  most  instances  although  there  are  a  few  times 
when  CT  PC  <  C,  NND  in  the  table.  Further  inspection  of  the  costs  through  the  use  of 

paired  t-tests  establishes  that  the  means  for  the  outputs  of  the  three  classifiers  are 
statistically  different  with  CT  PC  >  CT  mD  holding  as  stated  above.  Table  4-7  displays  the 

results  of  the  two-tailed  paired  t-tests  conducted  on  CT  PC  and  CT  NND  .  Sample  size  was 
varied  across  all  sample  sizes  tested,  but  p  was  displayed  at  the  extremes  p  =  {0,0.9}  . 
The  null  hypothesis  tested  is  that  the  means  are  equal.  The  p-value  displays  the 
probability  of  observing  the  given  result  by  chance  given  that  the  null  hypothesis  is  true; 
p-values  less  than  a  -  0.05  indicate  no  statistical  difference  in  means.  The  high  and  low 
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Cl’s  in  the  table  denote  the  confidence  interval  on  the  mean  o f C7  PC  -  CT  NND .  A 

confidence  interval  that  contains  zero  represents  an  observed  result  that  fails  to  reject  the 
null  hypothesis.  Finally,  the  t  statistic  and  t  critical  value  compare  the  difference  in 
means.  A  t  statistic  that  is  greater  than  the  t  critical  results  in  rejecting  the  null  hypothesis 
that  there  is  no  difference  between  the  means. 


Table  4-7:  Paired  T-Test  of  CT  NND  and  CT  PC  Problem  2  Costs. 


Sample  Size 

P 

p-value 

Low  Cl 

High  Cl 

t-statistic 

t-critical 

25 

0.0 

0.12 

-0.05 

0.37 

1.75 

±2.011 

50 

0.0 

0.03 

0.03 

0.34 

2.72 

±2.011 

100 

0.0 

0.00 

0.22 

0.30 

14.42 

±2.011 

250 

0.0 

0.00 

0.17 

0.32 

7.70 

±2.011 

500 

0.0 

0.00 

0.25 

0.32 

19.37 

±2.011 

1000 

0.0 

0.00 

0.27 

0.34 

18.29 

±2.011 

25 

0.9 

0.05 

0.00 

0.54 

2.33 

±2.011 

50 

0.9 

0.02 

0.03 

0.19 

3.06 

±2.011 

100 

0.9 

0.00 

0.10 

0.22 

6.35 

±2.011 

250 

0.9 

0.00 

0.07 

0.17 

5.20 

±2.011 

500 

0.9 

0.01 

0.03 

0.14 

3.49 

±2.011 

1000 

0.9 

0.00 

0.10 

0.18 

7.71 

±2.011 

Only  one  instance  in  the  table  fails  to  reject  the  null  at  sample  size  =  25  and  p  =  0.  All  of 
the  other  observations  show  a  statistical  difference  between  the  mean  of  CT  NND 

andCr  PC .  A  paired  t-test  of  all  of  the  observations  from  problem  2  also  shows  that  the 

means  for  all  three  costs  are  statistically  different  as  shown  in  Table  4-7.  Thus,  it  can  be 
inferred  that  the  cost  relationship  holds  under  differing  levels  of  correlation.  A  graphical 
representation  of  the  results  from  the  paired  t-tests  can  be  seen  in  Figure  4-4,  Figure  4-5,  and 
Figure  4-6.  ISOC  non-declaration  political  correctness  heuristic  costs  (INDPc),  ISOC 
forced  decision  costs  (IFD)  and  PNN  fusion  (NFnd)  costs  are  paired  across  sample  sizes 
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and  levels  of  correlation  p  .  Clusters  of  points  above  the  45°  line  indicate  that  Y  values 
dominate  X;  X  values  dominate  Y  when  the  majority  of  the  paired  observations  fall 
below  the  45°  line.  Figure  4-4  graphically  depicts  the  costs  relationship  between  PNN 
fusion  allowing  non-declarations  and  the  ISOC  non-declaration  political  correctness 
heuristic. 


INDpcCosts 

Figure  4-4:  Problem  2  INDPC  Heuristic  and  NFnd  Fusion  Ordered  Pairs. 

The  ISOC  non-declaration  political  correctness  heuristic  costs  are  always  greater  than  the 
PNN  fusion  allowing  non-declarations.  The  centroid  of  the  cluster  of  data  falls 
at  (0.92,0.72)  ,  which  represents  the  paired  means  of  the  data.  Continuing  with  this 
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graphical  approach  in  Figure  4-5,  it  is  clear  that  the  ISOC  forced  decision  heuristic  falls 
short  of  the  other  two  methods  because  its  costs  completely  dominate  the  other 
approaches.  Figure  4-5  shows  ISOC  forced  decision  paired  with  the  ISOC  non-declaration 
political  correctness  heuristic. 


Figure  4-5:  Problem  2  IFD  and  INDPC  Ordered  Pairs. 

The  centroid  for  this  pairing  of  data  falls  at  (2.05,0.92)  and  it  is  clear  that  the  ISOC 
forced  decision  costs  are  much  higher  than  the  ISOC  non-declaration  political  correctness 
heuristic  costs.  Following  the  logic  of  the  original  inequality 

suggested,  CT  FD  >  CT  PC  >  CT  NND  and  the  plot  of  ISOC  forced  decision  costs  versus  PNN 
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fusion  costs  should  and  does  show  the  same  story.  ISOC  forced  decision  is  the  most 
costly  method  of  labeling  targets  as  seen  again  in  Figure  4-6. 


Figure  4-6:  Problem  2  IFD  and  NFnd  Ordered  Pairs. 

The  centroid  for  Figure  4-6  is  located  at  (2.05,0.72)  again  showing  the  cost  relationship 
holding  true  to  the  inequality. 

Problem  2  also  followed  the  pattern  displayed  for  problem  1  of  a  global  hostile 
rule  being  reached  as  sample  size  was  increased.  In  fact,  problem  2  reached  a  single 
hostile  rule  comprised  of  state  (H-H)  for  a  sample  size  of  500  exemplars  per  class 
regardless  of  the  amount  of  correlation  introduced. 
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4.5  Problem  3:  8  Feature  with  Autocorrelation  Case 

This  data  set  was  applied  to  the  fusion  processes  described  in  Chapter  3.  Problem  3 
held  true  to  the  inequality  described  in  the  general  findings  section 
with  CT  FD  >  CT  PC>  CT  NND .  Figure  4-7  shows  the  average  costs  incurred  from  each  of  the 

three  methods  at  varied  levels  of  inter-correlation  p  and  autocorrelation  pauto . 


All  Sample  Sizes 


2.50 


2.00 


_  1.50 

(A 

O 

O 


1.00 


0.50 


0.00 


□  IFD 

□  INDpc 

□  NFnd 


(0.0,  0.0)  (0.0,  0.9)  (0.9,  0.0)  (0.9,  0.9) 

(P>  Pauto) 


Figure  4-7:  Cost  Comparisons  Between  Methods. 


The  error  bars  on  Figure  4-7  represent  a  95%  confidence  interval  of  the  true  mean 
cost.  Consider  the  ISOC  forced  decision  costs  which  represent  the  largest  costs  in  the 
above  figure.  The  ISOC  forced  decision  error  bars  all  overlap  showing  that  the  average 
IFD  costs  are  not  statistically  different.  The  error  bars  on  ISOC  forced  decision  are  all 
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above  the  error  bars  for  INDpc  and  NFnd  fusion  showing  that  ISOC  forced  decision  is 
statistically  more  costly  than  INDpc  and  NFnd  fusion.  Next,  consider  the  bars  associated 
with  INDpc-  There  is  no  clear  difference  between  any  of  the  individual  error  bars.  Thus, 
ISOC  non-declarations  political  correctness  heuristic  is  robust  to  correlation  in  this 
problem.  Finally,  consider  the  PNN  allowing  non-declarations  error  bars.  There  is  a 
statistical  difference  between  NFnd  fusion  costs  when  p  and  paut0  (p,  pauto )  are  changed 
from  (0.9, 0.0)  to  (0.0, 0.9) .  The  NFnd  perfonned  consistently  with  the  exception  of  the 
improvement  when  autocorrelation  was  high.  INDpc  and  NFnd  were  statistically 
different  at  (0.0, 0.9)  further  showing  that  the  NFnd  was  able  to  perform  well  at  this 
setting.  Otherwise,  there  was  no  statistical  difference  between  the  two  costs.  Thus,  the 
inequality  stated  above  holds  true  ( CT  FD  >  CT  PC  >  CT  NND  ).  When  correlation  levels  were 

increased  and  sample  size  was  set  at  500  exemplars  per  class,  the  hostile  rule  had  a 
tendency  to  include  the  output  state ("f7", "//")  as  shown  in  Table  4-8.  This  could 
represent  the  fact  that  the  correlation  of  the  data  is  such  that  the  quadratic  function  is  able 
to  discriminate  between  the  classes  more  often. 

Table  4-8:  Optimal  Hostile  Rules  for  Sample  Size  =  500,  p  =  0.9. 

Indications  (Linear  -  Quadratic) 


Run 

H-H 

H-U 

H-F 

U-H 

u-u 

U-F 

F-H 

F-U 

F-F 

1 

1 

0 

0 

0 

0 

0 

1 

0 

0 

ra 

1 

0 

0 

0 

0 

0 

1 

0 

0 

1 

0 

0 

0 

0 

0 

0 

0 

0 

Hi 

1 

0 

0 

0 

0 

0 

1 

0 

0 

Kfl 

1 

0 

0 

0 

0 

0 

0 

0 

0 
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The  percent  of  non-declarations  resulting  from  NFNd  were  dependent  upon  the 
level  of  correlation.  As  correlation  increased,  the  NFnd  indifference  window  also  grew 
causing  more  non-declared  exemplars;  the  result  is  the  increase  in  costs  shown  in  Figure 
4-7.  Figure  4-8  compares  the  optimal  Sm  locations  when  p  =  0  plotted  versus 

when  p  =  0.9 .  A  triangle  in  the  plot  represents  the  optimal  indifference  window  size  for 
the  PNN  with  non-declarations  paired  between  no  correlation  and  0.9  correlation;  each 
triangle  is  an  instance  of  sample  size  and  random  seed  (5  sample  sizes  x  5  runs  =  25 
points).  The  size  of  the  window  is  in  direct  relation  to  the  percent  of  non-declarations. 
The  figure  shows  that  the  PNN  fusion  method  indifference  window  increases  as 
correlation  increases.  Thus,  the  PNN  reacted  to  increases  in  correlation  while  the  ISOC 
fusion  methods  declarations  were  consistent;  PNN  fusion  had  to  non-declare  more 
exemplars  as  correlation  levels  were  increased  in  reaction  to  the  correlation  and  ISOC 
heuristics  assumed  independence  and  remained  relatively  unchanged. 
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p  =  0.0 


Figure  4-8:  PNN  Indifference  Window  Locations. 


The  INDpc  method’s  indifference  window  settings  were  relatively  robust  to  the 
correlation.  Figure  4-9  compares  the  optimal  locations  of  the  indifference  window  on  the 
linear  classifier  while  varying  inter-correlation.  The  histograms  are  very  similar  further 
supporting  the  robustness  of  the  ISOC  fusion  methods  to  correlation  due  to  the 
assumption  of  independence. 
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p  =  0.0  P  =  0.9 


0.0  0.1  0.2  0.3  0.4  0.5  0.0  0.1  0.2  0.3  0.4  0.5 


Figure  4-9:  Linear  Indifference  Window  varied  by  p  . 


This  problem  also  displayed  the  PNN  fusion  with  non-declaration  method’s 
decrease  in  accuracy  associated  with  induced  levels  of  correlation;  this  NFNd  “weakness” 
is  consistent  with  the  past  research  efforts  of  Storm  et  al.,  (2003)  and  Leap  et  al.,  (2004) 
showing  the  PNN  reacts  to  levels  of  correlation.  As  the  levels  of  correlation  were 
increased,  the  NFnd  method’s  ability  to  classify  incoming  exemplars  was  hampered  in 
some  instances.  The  changes  in  correlation  levels  had  less  of  an  effect  on  the  ISOC  non¬ 
declaration  political  correctness  heuristic  since  it  assumes  independent  features.  ISOC 
forced  declarations  had  lower  accuracies  as  expected  because  the  method  does  not  allow 
any  fused  non-declarations.  INDPc  and  NFnd  were  both  able  to  improve  accuracy  by 
ridding  exemplars  with  high  probabilities  of  misclassification.  Figure  4-10  shows  pair-wise 
accuracies  for  NFnd  at  the  extremes  for  correlation  ((p  =  0, paut0  =  0) 
to(p  =  0.9,pauto  =  0.9)). 
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Figure  4-10:  NFnd  Accuracy  (No  Correlation  vs.  High  Correlation). 


Further  inspection  through  a  paired  t-test  shows  a  statistical  difference  between  the  means 
for  PNN  fusion  with  and  without  correlation  (both  p  and  pauto  ).  Dividing  the  space  from 

Figure  4-10  to  consider  all  four  extreme  points  for  p  e  {0,0.9}  and  pauto  e  {0,0.9}  generated 
Figure  4-11. 
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0  0.2  0.4  0.6  0.8  1  0  0.2  0.4  0.6  0.8  1 


P-0,pt-0  p  -  0,p  .  -  0 

K  ’  Kauto  K  ’  Kauto 


Figure  4-11:  PNN  Fusion  Accuracy  Pairwise  Comparison  of  Correlation. 


The  difference  is  not  as  clear  in  the  graphs  in  Figure  4-11  as  it  was  in  the  cost  comparisons, 
but  two  tailed  paired  t-tests  of  all  four  of  the  above  plots  proved  that  the  only  statistical 
difference  in  accuracies  occurred  when  pauto  =  0  and  p  was  varied  between  0  and  0.9 

(the  upper  left  plot  of  Figure  4-11).  The  first  row  of  Table  4-9  shows  this  occurrence  with  a 
p-value  less  than  0.05,  a  t-statistic  greater  than  the  t-critical  value  and  a  confidence 
interval  which  does  not  contain  0. 
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Table  4-9:  PNN  Fusion  Accuracy  Pair-wise  T-test  Results. 


.0 

0.9 

0.0 

.0 

0.0 

0.9 

.9 

0.9 

0.9 

.0 

0.9 

0.9 

p-value  Cl  Low  Cl  High  t-statistic  t-critical 


)  0.052 

±  2.001 

5  -0.001 

0.075 

1.933 

±  2.001 

)  -0.019 

0.074 

1.188 

±  2.001 

1  -0.055 

0.023 

-0.823 

±  2.001 

Thus,  the  PNN  was  susceptible  to  inter-correlation  assuming  that  there  was  no 
autocorrelation  present.  Otherwise,  the  PNN  accuracies  remained  statistically  similar. 

4.6  Problem  4:  8  Feature  Triangle  Case 

This  data  set  was  applied  to  the  fusion  processes  described  in  Chapter  3.  This 
problem  also  held  the  same  characteristics  as  discussed  in  the  general  findings  section. 
The  costs  were  found  to  follow  the  general  inequality  CT  FD  >CT  PC>  CT  NND  in  a 

statistical  sense  through  paired  t- tests.  In  addition,  the  optimal  hostile  rules  followed  the 
generally  observed  improvement  as  sample  size  increased  as  shown  in  problem  1 . 

Further  review  of  these  hostile  rules  showed  that  the  rules  were  more  variable  for  high 
levels  of  correlation;  the  rules  remained  relatively  constant  when  there  was  no  correlation 
present  for  all  sample  sizes.  Figure  4-12  shows  a  histogram  comparing  the  optimal  hostile 
rules  as  p  is  varied.  Table  4-10  displays  the  numbered  optimal  hostile  rules  found  using 
the  ISOC  non-declaration  political  correctness  heuristic  relating  to  Figure  4-12. 
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Table  4-10:  Problem  4  Numbered  Optimal  Hostile  Rules. 


Hostile  Rules 


Rule# 

H-H 

H-U 

H-F 

U-H 

u-u 

U-F 

F-H 

F-U 

F-F 

2 

1 

0 

0 

0 

0 

0 

0 

0 

0 

3 

0 

1 

0 

0 

0 

0 

0 

0 

0 

11 

1 

1 

0 

0 

0 

0 

0 

0 

0 

13 

1 

0 

0 

1 

0 

0 

0 

0 

0 

16 

1 

0 

0 

0 

0 

0 

1 

0 

0 

48 

1 

1 

0 

1 

0 

0 

0 

0 

0 

51 

1 

1 

0 

0 

0 

0 

1 

0 

0 

54 

1 

0 

1 

1 

0 

0 

0 

0 

0 

Figure  4-12:  Problem  4  Optimal  Hostile  Rules  Histogram. 


The  optimal  rule  set  was  constant  for  p  e  {0.4, 0.6}  and  very  steady 
for  p  e  {0.0, 0.2} .  This  problem  became  more  separable  as  correlation  increased  for  both 
classifiers;  the  optimal  rule  went  from  only  declaring  an  exemplar  as  hostile  when  both 
classifiers  labeled  it  hostile  to  a  more  aggressive  hostile  rule  declaring  the  exemplar 
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hostile  when  at  least  one  of  the  classifiers  labeled  it  hostile.  The  optimal  hostile  rules 
were  very  dependent  on  the  level  of  correlation  in  this  problem.  With  no  correlation 
between  the  feature  sets,  IND  only  declared  the  output  state  (H-H)  as  hostile.  As 
correlation  increased,  more  states  became  part  of  the  hostile  rule. 

4.7  Problem  5:  8  Feature  XOR  Case 

This  data  set  was  applied  to  the  fusion  processes  described  in  Chapter  3.  The 
general  results  held  for  this  problem  as  well.  The  ISOC  non-declaration  and  HINDI 
methods  were  unable  to  get  adequate  separation  of  the  classes.  The  linear  classifier  is  not 
apt  for  slicing  an  XOR  problem  such  as  this.  Thus,  the  PNN  outperformed  all  other 
methods.  Figure  4-13  shows  a  pair-wise  comparison  of  the  costs  further  supporting  the 
cost  inequality  stated  earlier. 
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Figure  4-13:  Marginal  Costs. 


Also,  the  box  plots  in  Figure  4-14  show  the  same  result.  The  box  plots  show  that  the 
average  cost  of  INDpc  and  NFnd  are  different.  The  left  box  plot  clearly  shows  that  the 
lower  50th  percentile  of  the  difference  between  INDPC  and  NFnd  is  greater  than  zero. 
Paired  t-tests  also  showed  the  inequality  of  CT  FD  >CT  PC>  CT  NND  to  hold  true  once 
again. 
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Figure  4-14:  INDPC  vs  NFnd  Fusion  Box  Plots. 

With  the  linear  classifier’s  inability  to  separate  the  groups,  the  linear  indifference  window 
returned  unexpected  results.  When  the  sample  size  was  increased  from  25  to  500,  the 
optimal  delta  window  setting  decreased.  This  is  counterintuitive  because  the  linear 
discriminant  function  is  a  poor  classifier  for  an  XOR  problem  and  logic  would  suggest 
that  the  indifference  window  would  be  large  allowing  more  individual  non-declarations 
from  the  classifier.  Figure  4-15  shows  a  histogram  of  the  linear  indifference  window 
setting  as  sample  size  was  varied  from  25  to  500. 


5,  Settings,  Sample  Size  =  25  Settings,  Sample  Size  =  500 


Figure  4-15:  Linear  Classifier  Indifference  Window  Histogram. 
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Due  to  the  poor  classification  accuracy  of  the  linear  classifier,  this  problem  and  problem 
6  were  both  reaccomplished  using  a  PNN  as  classifier  1  and  the  quadratic  discriminant 
function  as  classifier  2.  Unfortunately,  the  PNN  was  unable  to  improve  performance  by 
acting  as  an  individual  classifier.  The  class  means  were  too  close  to  allow  for  any  true 
separation. 


4.8  Problem  6:  8  Feature  XOR  with  Autocorrelation  Case 


o 


o 


This  data  set  was  applied  to  the  fusion  processes  described  in  Chapter  3.  As  stated 
for  problem  5,  the  linear  classifier  was  inadequate  to  get  a  reasonable  classifier  accuracy 
and  the  fused  results  suffered.  Results  were  typical,  but  not  reviewed  here  as  the 
classifiers  were  poor. 


4.9  Problem  2:  8  Feature  Case  RSM  Study 


The  case  study  compared  5  factors  in  a  full  factorial  experiment  using  INDLR  and 
NFNd  as  described  in  Chapter  3.  The  5  factors  and  their  settings  are  displayed  in  Table 
4-11. 


Table  4-11:  RSM  Factor  Settings. 


Factor 

-  (Low) 

+  (High) 

A  -  CFP 

10 

20 

B  -  CFN 

5 

9 

c  -  cND 

1 

4 

D-p 

0 

0.9 

^  "  Pauto 

0 

0.9 
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The  ISOC  likelihood  ratio  heuristic  was  shown  to  be  robust  to  correlation.  The  models 


found  were  not  dependent  on  any  levels  of  correlation.  In  fact,  the  main  factor  for  all 
models  tested  was  factor  C,  the  cost  of  non-declaration.  This  factor  proved  to  be  5  to  10 
times  more  important  than  any  other  factor  based  on  the  sums  of  squares.  Table  4-12 
shows  the  factors  for  each  model  in  terms  of  their  importance  by  sums  of  squares.  The 
models  are  generated  for  all  possible  indifference  windows  as  described  in  Chapter  3. 
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Table  4-12:  RSM  Results  for  INDLR. 

Factors  and  Interactions  in  Order  of  Importance 


delta  1 

delta  2 

1 

2 

3 

4 

Adj  R2 

RMSE 

0 

0 

C 

0.9864 

0.03 

0 

0.1 

c 

A 

0.9857 

0.03 

0 

0.2 

c 

A 

0.9885 

0.02 

0 

0.3 

c 

A 

0.9880 

0.03 

0 

0.4 

c 

A 

0.9892 

0.03 

0 

0.5 

c 

B 

0.9857 

0.03 

0.1 

0 

c 

B 

A 

0.9454 

0.05 

0.1 

0.1 

c 

B 

A 

0.8752 

0.07 

0.1 

0.2 

c 

B 

A 

0.8688 

0.07 

0.1 

0.3 

c 

A 

B 

0.8529 

0.08 

0.1 

0.4 

c 

B 

A 

0.8632 

0.08 

0.1 

0.5 

c 

0.9988 

0.01 

0.2 

0 

c 

0.9470 

0.05 

0.2 

0.1 

c 

B 

A 

0.8753 

0.07 

0.2 

0.2 

c 

B 

A 

0.8650 

0.07 

0.2 

0.3 

c 

B 

A 

0.8511 

0.08 

0.2 

0.4 

c 

B 

A 

0.8681 

0.08 

0.2 

0.5 

c 

0.9948 

0.02 

0.3 

0 

c 

0.9556 

0.05 

0.3 

0.1 

c 

B 

A 

0.8790 

0.07 

0.3 

0.2 

c 

B 

A 

0.8807 

0.07 

0.3 

0.3 

c 

B 

A 

0.8741 

0.07 

0.3 

0.4 

c 

B 

A 

0.8895 

0.07 

0.3 

0.5 

c 

B 

BC 

0.9968 

0.01 

0.4 

0 

c 

0.9415 

0.06 

0.4 

0.1 

c 

A 

B 

AC 

0.9166 

0.06 

0.4 

0.2 

c 

B 

A 

AC 

0.9145 

0.06 

0.4 

0.3 

c 

B 

A 

BC 

0.9000 

0.06 

0.4 

0.4 

c 

B 

BC 

0.9133 

0.07 

0.4 

0.5 

c 

B 

BC 

0.9982 

0.01 

0.5 

0 

c 

1.0000 

0.00 

0.5 

0.1 

c 

1.0000 

0.00 

0.5 

0.2 

c 

1.0000 

0.00 

0.5 

0.3 

c 

1.0000 

0.00 

0.5 

0.4 

c 

0.9983 

0.01 

0.5 

0.5 

c 

1.0000 

0.00 

The  NFnd  technique  was  found  to  react  to  the  correlation  as  shown  in  Table  4-13. 
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Table  4-13:  NFnd  RSM  Results. 


Factors  and  Interactions  in  order  in  Importance 
2  3  4  5  6 


Adj  R2  RMSE 


0 

0.1 

A 

A 

D 

C 

E 

D 

B 

E 

DE 

B 

DE 

C 

C 

A 

0.56 

0.70 

0.55 

0.57 

0.4 

C 

0.86 

0.48 

0.5 

C 

1.00 

0.00 

Factors  D  (p)  and  E  (paut0)  were  found  to  be  important  in  the  first  two  models  although  as 
the  indifference  window  size  increased,  the  model  became  dependent  on  the  cost  of  non¬ 
declaration  only.  The  models  also  became  more  predictive  as  the  window  increased  due 
to  the  associated  increasing  probability  of  non-declarations. 

•  -  ° 

4.10  Problem  2:  8  Feature  Case  Likelihood  Ratio  Study  - — 

This  section  attempted  to  prove  INDLr  returned  optimal  rule  sets  as  described  in 


Chapter  3.  Rule  sets  were  generated  and  compared  for  problems  2  and  4  through 


complete  enumeration  of  ISOC  allowing  non-declarations  and  INDlr.  This  was 


accomplished  for  S\  =  (7-1)  x  0.05  for  j  =  1,2  and  i  =  1,2,. .  .,1 1 .  INDLr  rules  were  then 


compared  to  the  optimal  rules  found  through  complete  enumeration  of  ISOC  allowing 
non-declarations.  These  optimal  rules  always  contained  the  INDlr  optimal  rule  showing 
that  the  likelihood  ratio  heuristic  can  reach  an  optimal  rule  set  for  the  problems  tested. 

4.11  Summary 

This  chapter  summarized  the  findings  of  this  research.  The  heuristics  described  in 
Chapter  3  were  compared  to  each  other  and  neural  networks.  Costs  and  accuracies  were 
compared. 
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5.  Conclusions  and  Recommendations 


5.1  Introduction 

This  chapter  briefly  reviews  the  research.  The  literature  review,  methodology  and 
findings  and  analysis  are  each  discussed.  The  important  findings  of  this  research  process 
as  a  whole  are  reiterated  and  future  research  ideas  are  introduced. 

5.2  Literature  Review 

Fratricide  is  unacceptable  to  the  American  public  today.  Air  Force  guidance 
supports  this  opinion  by  directing  the  use  of  fusion  for  improved  accuracy  and  reliability 
of  classifiers  thus  limiting  the  possibility  of  friendly  fire  incidents  (AFP AM  14-410, 
1998). 

Two  fusion  methods  considered  in  this  research  were  the  Identification  System 
Operating  Characteristic  (ISOC)  and  Probabilistic  Neural  Network  (PNN)  fusion.  The 
ISOC  assumes  information  is  independent  while  the  PNN  makes  no  such  assumption. 
While  the  assumptions  are  clear,  the  effects  due  to  dependence  and  correlation  are  not 
(Willett,  et  al.,  2000). 

The  data  used  for  this  experiment  came  from  Leap  (2004)  as  described  in  section 
2.7.  The  data  used  in  this  research  employed  varied  levels  of  correlation  across  6 
problems  differing  in  geometry,  mean  vectors,  and  correlations  p  and  paut0. 

5.3  Methodology 

Indifference  windows  were  introduced  both  at  the  individual  classifier  level  as  well 
as  at  the  fused  classifier  level;  ISOC  methods  utilized  indifference  windows  at  the 
individual  classifier  level  while  the  neural  networks  allowed  non-declarations  at  the  fused 
level.  The  decision  threshold  was  held  constant  at  T  =  0.5  to  limit  the  number  of 
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variables  in  this  research;  this  seemed  reasonable  as  the  a  priori  class  sizes  were  assumed 
equal.  Three  separate  ISOC  heuristics  were  developed  for  the  addition  of  non¬ 
declarations  both  at  the  individual  classifier  level  as  well  as  at  the  fused  indication  level. 
Non-declaration  heuristics  were  compared  and  contrasted  through  a  cost  function.  IFD 
only  allowed  individual  classifier  non-declarations  while  INDpc  and  INDLr  allowed  fused 
non-declared  indications.  As  a  result,  IFD  non-declaration  probabilities  P(NDis)  were 
constants  for  a  given  grid  point  (8[,S ])  .  INDPC  and  INDLR  non-declared  probabilities 
were  based  on  fused  non-declared  indications  and  varied  based  upon  both  the  rule  set 
used  as  well  as  the  grid  point  location.  A  PNN  fusion  method  was  also  developed  to 
include  non-declarations.  This  was  used  as  a  benchmark  to  compare  the  heuristics.  A 
post-optimality  analysis  of  the  costs  was  performed  to  determine  if  there  were  significant 
interactions  between  costs  and  levels  of  correlation. 

5.4  Findings 

The  heuristics  developed  above  were  comparable  to  the  PNN  fusion  method.  The 
ISOC  methods  continued  to  be  robust  to  correlation  with  respect  to  accuracy  and  cost 
although  costs  were  affected  in  some  instances.  ISOC  fusion  methods  became  more 
stable  when  sample  size  was  sufficiently  large;  optimal  individual  classifier  indifference 
window  settings  converged  to  a  specific  grid  point  location  and  optimal  rule  sets 
converged  to  a  single  set.  The  PNN  fusion  methods  accuracies  reacted  appropriately  with 
the  introduction  of  correlation  which  was  consistent  with  findings  in  Leap  (2004).  NFnd 
and  INDpc  were  comparable  methods  of  classification  with  similar  costs,  accuracies  and 
non-declaration  probabilities. 
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5.5  Implications 

This  research  developed  and  tested  heuristic  approaches  to  finding  the  optimal 
Boolean  fusion  for  a  set  of  classifiers  allowing  non-declarations.  The  cost  of  non¬ 
declarations  was  found  to  be  the  overriding  factor  within  the  cost  space  as  specified  by 
ACC.  Non-declaration  schemes,  such  as  those  suggested  in  this  research,  have  operating 
points  which  are  less  costly  than  forced  decision  methods  but  these  solutions  are  sensitive 
to  the  cost  of  a  non-declaration.  Boolean  fusion  can  find  solutions  which  are  comparable 
to  those  of  feature  level  fusion  methods  when  non-declarations  are  allowed.  Feature  level 
fusion  requires  a  higher  level  of  understanding  than  decision  level  fusion.  An  operator 
can  understand  individual  sensor  indications  leading  to  a  fused  label  while  feature  level 
fusion  techniques  such  as  neural  networks  require  the  testing  of  several  parameters. 

These  parameters  can  have  a  great  effect  on  the  fused  indications  out  of  the  network.  As 
a  result,  decision  level  fusion  with  non-declarations  can  be  utilized  to  avoid 
inconsistencies  while  still  reaching  a  satisfactory  level  of  performance. 

5.6  Future  Research 

This  research  was  the  next  step  in  determining  the  effects  of  correlation  on 
classifier  fusion.  One  area  for  improvement  to  this  study  is  the  use  of  real  world  data  for 
further  confirmation.  One  further  step  would  be  to  model  ISOC  fusion  for  dependent 
classifiers  to  consider  more  real  world  applications.  Another  improvement  would  be  the 
consideration  of  a  PNN  compared  with  the  ISOC  heuristics  described  while  allowing  the 
PNN  to  train  on  an  equivalent  amount  of  data  to  see  if  the  findings  are  consistent. 

Finally,  the  political  correctness  measure  could  be  applied  to  the  INDLr  as  well  as  a 
difference  measure  to  sort  the  rankings  of  states  with  no  occurrences. 
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